Gym Passenger Flow Analysis - Data Collection Part
Collecting gyms and surrounding facilities in Beijing through AMap(高德地图) API
A brief introduction to AMap: AMap is a digital map prodiver in China. It provides navigation and location services solutions.
What we need:
- gyms and their locations
- residences and their locations
- parking lot and their locations
- bus and subway stations and their locations
- all entertainment venues and their locations
Because AMap had already categorized all buildings, so my program just needed to call its API with correct parameters. Amap then returned a Json file. Because, in this step, all I cared was locations, I discarded other info and constructed a CSV for buildings and locations.
An example CSV:
十二星座游泳健身(关店),”116.340419,39.926247”
恒昌花园游泳健身会所,”116.337894,39.887361”
合生财富广场健身房,”116.378204,39.958753”
Collecting the number of comments of each gym from Dazhongdianpiang (大众点评)
A brief introduction to 大众点评: 大众点评 is a website for reviewing restaurant and group purchases. I will use DZDP in refer it.
One problem of this project was how to evaluate passenger flows because there was no such a website that recorded this kind of information. Thus, I decided to use the number of comments in DZDP to indirectly represent passenger flows because no matter a gym is good or not, some people always give their comments.
Helper library:
DZDP implements a very strict anti-web-crawler technique. It has a collection of characters, and every character in that collection will be replaced with picture. Thus, I used https://github.com/01ly/DPspider to collect info.
All data was then save in a DB database. I constructed a CSV file that contains gyms’ names and numbers of comments. Until this point, all data we needed was collected.