通过监视网络搜索查询来检测空气污染水平的高度：基于深度学习的时间序列预测

论文标题

通过监视网络搜索查询来检测空气污染水平的高度：基于深度学习的时间序列预测

Detecting Elevated Air Pollution Levels by Monitoring Web Search Queries: Deep Learning-Based Time Series Forecasting

论文作者

Lin, Chen, Yousefi, Safoora, Kahoro, Elvis, Karisani, Payam, Liang, Donghai, Sarnat, Jeremy, Agichtein, Eugene

论文摘要

实时空气污染监测是公共卫生和环境监视的宝贵工具。近年来，使用人工神经网络（ANN），空气污染预测和监测研究的增长急剧增加。大多数先前的工作都依赖于从地面监测器和气象数据中收集的污染物浓度的建模，以长期预测室外臭氧，氮氧化物和PM2.5。鉴于传统的高度复杂的空气质量监视器很昂贵并且无法普遍使用，因此这些模型无法充分服务于不在污染物监测地点附近的人。此外，由于先前的模型是基于从传感器收集的物理测量数据建立的，因此它们可能不适合预测污染暴露所经历的公共健康影响。这项研究旨在开发和验证模型，以使用Web搜索数据向观察到的污染水平，从主要搜索引擎接近实时可公开可用。我们使用传统的监督分类方法和最先进的深度学习方法开发了基于机器学习的新型模型，通过使用一般可用的气象数据和汇总基于Web的搜索量数据，从Google趋势中得出了较高的搜索量数据，从而检测到美国城市一级的空气污染水平升高。我们通过预测2017年和2018年美国十个主要大都会统计区（MSA）的三种关键空气污染物（O3），二氧化氮（NO2）和细颗粒物（PM2.5））来验证了这些方法的性能。

Real-time air pollution monitoring is a valuable tool for public health and environmental surveillance. In recent years, there has been a dramatic increase in air pollution forecasting and monitoring research using artificial neural networks (ANNs). Most of the prior work relied on modeling pollutant concentrations collected from ground-based monitors and meteorological data for long-term forecasting of outdoor ozone, oxides of nitrogen, and PM2.5. Given that traditional, highly sophisticated air quality monitors are expensive and are not universally available, these models cannot adequately serve those not living near pollutant monitoring sites. Furthermore, because prior models were built on physical measurement data collected from sensors, they may not be suitable for predicting public health effects experienced from pollution exposure. This study aims to develop and validate models to nowcast the observed pollution levels using Web search data, which is publicly available in near real-time from major search engines. We developed novel machine learning-based models using both traditional supervised classification methods and state-of-the-art deep learning methods to detect elevated air pollution levels at the US city level, by using generally available meteorological data and aggregate Web-based search volume data derived from Google Trends. We validated the performance of these methods by predicting three critical air pollutants (ozone (O3), nitrogen dioxide (NO2), and fine particulate matter (PM2.5)), across ten major U.S. metropolitan statistical areas (MSAs) in 2017 and 2018.

下载PDF全文

下载文献需遵守相关版权规定

论文标题