CCPortal
DOI10.1016/j.atmosenv.2020.118125
Enhancing the Evaluation and Interpretability of Data-Driven Air Quality Models
Gu J.; Yang B.; Brauer M.; Zhang K.M.
发表日期2021
ISSN1352-2310
卷号246
英文摘要Resolving spatial variability in ambient air pollutant and quantifying contributing factors are critical to human exposure assessment and effective pollution control. Data-driven techniques have been employed in air quality modeling due to their capability to capture the complex relationships in data as well as for the benefit of fast and easy implementation. In this study, we addressed two issues on model evaluation and interpretability by applying two common data-driven approaches, linear regression (LR) and random forest (RF) with potentially predictive land-use predictor variables to predict spatial variations of air pollution in an urban setting. The data came from the measurement of ambient nitrogen dioxide (NO2) concentrations in the Greater Vancouver Regional District in Canada. First, we showed that the model performance was sensitive to the division of training and test sets. Applying a limited number of hold-out validations or cross-validations and reporting the mean model metrics cannot capture the variability and fairly evaluate the model performance. We proposed repeated cross-validations (RCVs) as a reliable evaluation method that accounts for both mean and variance. Second, there is not a consistent approach to measure the importance of predictor variables and quantify their contributions among different types of data-driven models. Traditional approaches only reflect the relative importance among predictor variables in terms of predictive power without a quantification of contribution to the model output. We proposed to apply SHapley Additive exPlanations (SHAP), a Shapley-value-based explanation method based on the coalitional game theory, as a unifying framework to interpret and compare different types of data-driven methods. We showed that SHAP is capable of 1) calculating predictor variable's contribution to each data point; 2) ranking the importance of predictor variables in terms of their contributions to the model output. The results indicated that different models may favor different predictor variables and result in different interpretability. © 2020 The Authors
英文关键词Air pollution control; Air quality; Decision trees; Game theory; Land use; Nitrogen oxides; Quality assurance; Air quality modeling; Ambient nitrogen dioxide; Coalitional game theory; Complex relationships; Data driven technique; Data-driven methods; Reliable evaluation method; Traditional approaches; Quality control; nitrogen dioxide; air quality; ambient air; atmospheric modeling; atmospheric pollution; implementation process; model validation; pollution control; ranking; spatial variation; training; air quality; ambient air; Article; Canada; cross validation; model; priority journal; repeated cross validation; British Columbia; Canada; Vancouver [British Columbia]
语种英语
来源期刊Atmospheric Environment
文献类型期刊论文
条目标识符http://gcip.llas.ac.cn/handle/2XKMVOVA/169093
作者单位Sibley School of Mechanical and Aerospace Engineering, Cornell University, Ithaca, NY 14853, United States; School of Population and Public Health, The University of British Columbia, Vancouver, BC V6T 1Z3, Canada
推荐引用方式
GB/T 7714
Gu J.,Yang B.,Brauer M.,et al. Enhancing the Evaluation and Interpretability of Data-Driven Air Quality Models[J],2021,246.
APA Gu J.,Yang B.,Brauer M.,&Zhang K.M..(2021).Enhancing the Evaluation and Interpretability of Data-Driven Air Quality Models.Atmospheric Environment,246.
MLA Gu J.,et al."Enhancing the Evaluation and Interpretability of Data-Driven Air Quality Models".Atmospheric Environment 246(2021).
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Gu J.]的文章
[Yang B.]的文章
[Brauer M.]的文章
百度学术
百度学术中相似的文章
[Gu J.]的文章
[Yang B.]的文章
[Brauer M.]的文章
必应学术
必应学术中相似的文章
[Gu J.]的文章
[Yang B.]的文章
[Brauer M.]的文章
相关权益政策
暂无数据
收藏/分享

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。