Gym Passenger Flow Analysis - Data Collection Part

Collecting gyms and surrounding facilities in Beijing through AMap(高德地图) API

A brief introduction to AMap: AMap is a digital map prodiver in China. It provides navigation and location services solutions.

What we need:

Because AMap had already categorized all buildings, so my program just needed to call its API with correct parameters. Amap then returned a Json file. Because, in this step, all I cared was locations, I discarded other info and constructed a CSV for buildings and locations.

An example CSV:
十二星座游泳健身(关店),”116.340419,39.926247”
恒昌花园游泳健身会所,”116.337894,39.887361”
合生财富广场健身房,”116.378204,39.958753”

Collecting the number of comments of each gym from Dazhongdianpiang (大众点评)

A brief introduction to 大众点评: 大众点评 is a website for reviewing restaurant and group purchases. I will use DZDP in refer it.

One problem of this project was how to evaluate passenger flows because there was no such a website that recorded this kind of information. Thus, I decided to use the number of comments in DZDP to indirectly represent passenger flows because no matter a gym is good or not, some people always give their comments.

Helper library:

DZDP implements a very strict anti-web-crawler technique. It has a collection of characters, and every character in that collection will be replaced with picture. Thus, I used https://github.com/01ly/DPspider to collect info.

All data was then save in a DB database. I constructed a CSV file that contains gyms’ names and numbers of comments. Until this point, all data we needed was collected.