HISTORICAL AIR QUALITY DATA
Source: US Department of State Air Quality Monitoring Program (http://stateair.net/)
- PM 2.5 concentration levels which indicate air quality for five different cities in China
- Each dataset spans one year and concentration measurements are recorded hourly every day
- We selected 2016 for our analysis because our mobile activity and mask sales data is also within that year, so our research is narrowed into this time frame.
BACKGROUND
PM2.5 refers to atmospheric particulate matter (PM) that have a diameter of less than 2.5 micrometers, which is about 3% the diameter of a human hair.
Since they are so small and light, fine particles tend to stay longer in the air than heavier particles. This increases the chances of humans and animals inhaling them into the bodies. Owing to their minute size, particles smaller than 2.5 micrometers are able to bypass the nose and throat and penetrate deep into the lungs and some may even enter the circulatory system. Studies have found a close link between exposure to fine particles and premature death from heart and lung disease. Fine particles are also known to trigger or worsen chronic disease such as asthma, heart attack, bronchitis and other respiratory problems.
Due to its detrimental effect on people’s health, we chose the concentration of PM2.5 as the indicator of air quality. This hourly-recorded concentration data was collected from the U.S Department of state, focusing on the five cities we worked on.
figure 1: 24-Hour PM2.5 Standard (μg/m3)

MASK SALES VOLUME
Source: https://www.huidianshang.com/
Recent years China saw a growing demand for filtration masks, pollution monitors, air purifiers and other anti-pollution gadgets. From December 16 to December 20, the five-day stretch when China’s air pollution was at its worst in 2016, domestic consumers bought 110,000 air purifiers through its online marketplaces, up 210 percent year-on-year, according to data from JD.com Inc.

Photo by Kevin Frayer/Getty Images
Knowing this, we decided to use the sales volume of anti-pollution masks as an indicator of people’s reaction to smog. We gathered this monthly data from Hui Dian Shang–a trade analysis website dedicated to Taobao (china’s biggest e-commerce platform) and made a line chart.
Unfortunately, the sales data doesn’t give any information about which city those masks were sold to. Also, it only shows the sales volume of all types of masks, including anti-smog ones and regular ones. However, Taobao is China’s biggest online-shopping platform and is significant enough to analyze patterns from, although it is not completely representative on region-specific areas.
DEMOGRAPHICS: INCOME & POPULATION
Source: (http://www.stats.gov.cn/english/Statisticaldata/AnnualData/)
Because we also wanted to see if there’s any connection between each city’s economic status and its air quality, we chose the disposable income as an indicator. The data was collected from each city’s bureau of statistics.

Additionally, we wanted to see if there was any relationship between population density and air quality. We collected population values for each region in 2016.
PROVINCE SHAPEFILE
Source: NYU Spatial Data Repository (https://geo.nyu.edu/catalog/nyu-2451-33920)
This shapefile was found for mapping purposes. We were able to map the boundaries of each province and highlight the 5 regions we are focusing our research on.
The province areas are important to be able to map to show where patterns of mobile activity occur as well as using to calculate how much activity was in each region.
MOBILE ACTIVITY
Source: Kaggle (https://www.kaggle.com/chinapage/china-mobile-user-gemographics)
The dataset is collected from TalkingData, a third-party mobile data platform in China that works with various mobile apps to log information every time a user opens one of the many integrated apps.
- 3 million data points (~1000 MB)
- device ID’s (distinguishes unique users)
- timestamps
- coordinates
An essential and foremost thing we were thinking about was which was indicator we could use for representing urban activities. In the research paper done collectively by MIT and Tongji University (Yan, Duarte, Wang, Zheng & Ratti, 2018), researchers used the social media check-in data from Sina Weibo micro-blogging microform, the Chinese version of Twitter as the indicator of people’s urban activity data. As mobile devices are ubiquitous (Chinese person surveyed owns at least a basic mobile phone (98%)). We thus used the data from Kaggle. The Data is collected from TalkingData SDK integrated within mobile apps. TalkingData serves under the service term between TalkingData and mobile app developers. Kaggle data was collected based on full recognition and consent from individual user of those apps have been obtained, and appropriate anonymization have been performed to protect privacy.
Check-in data can be used as a proxy indicator for activity within a region. This is meant to be analyzed in relation to air quality over the same time period as the check-in dataset.