CCPortal
DOI10.1007/s10661-017-6025-0
Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology
Fox, Eric W.1; Hill, Ryan A.2; Leibowitz, Scott G.1; Olsen, Anthony R.1; Thornbrugh, Darren J.2,3; Weber, Marc H.1
发表日期2017-07-01
ISSN0167-6369
卷号189期号:7
英文摘要

Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological data sets, there is limited guidance on variable selection methods for RF modeling. Typically, either a preselected set of predictor variables are used or stepwise procedures are employed which iteratively remove variables according to their importance measures. This paper investigates the application of variable selection methods to RF models for predicting probable biological stream condition. Our motivating data set consists of the good/poor condition of n = 1365 stream survey sites from the 2008/2009 National Rivers and Stream Assessment, and a large set (p = 212) of landscape features from the StreamCat data set as potential predictors. We compare two types of RF models: a full variable set model with all 212 predictors and a reduced variable set model selected using a backward elimination approach. We assess model accuracy using RF's internal out-of-bag estimate, and a crossvalidation procedure with validation folds external to the variable selection process. We also assess the stability of the spatial predictions generated by the RF models to changes in the number of predictors and argue that model selection needs to consider both accuracy and stability. The results suggest that RF modeling is robust to the inclusion of many variables of moderate to low importance. We found no substantial improvement in cross-validated accuracy as a result of variable reduction. Moreover, the backward elimination procedure tended to select too few variables and exhibited numerous issues such as upwardly biased out-of-bag accuracy estimates and instabilities in the spatial predictions. We use simulations to further support and generalize results from the analysis of real data. A main purpose of this work is to elucidate issues of model selection bias and instability to ecologists interested in using RF to develop predictive models with large environmental data sets.


英文关键词Random forest modeling;Variable selection;Model selection bias;National rivers and streams assessment;StreamCat dataset;Benthic macroinvertebrates
语种英语
WOS记录号WOS:000404652900013
来源期刊ENVIRONMENTAL MONITORING AND ASSESSMENT
来源机构美国环保署
文献类型期刊论文
条目标识符http://gcip.llas.ac.cn/handle/2XKMVOVA/57721
作者单位1.US EPA, Natl Hlth & Environm Effects Res Lab, Western Ecol Div, 200 SW 35th St, Corvallis, OR 97333 USA;
2.US EPA, Natl Hlth & Environm Effects Res Lab, Western Ecol Div, Oak Ridge Inst Sci & Educ ORISE Postdoctoral Part, 200 SW 35th St, Corvallis, OR 97333 USA;
3.Northern Great Plains Network, Natl Pk Serv, 231 East St Joseph St, Rapid City, SD 55701 USA
推荐引用方式
GB/T 7714
Fox, Eric W.,Hill, Ryan A.,Leibowitz, Scott G.,et al. Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology[J]. 美国环保署,2017,189(7).
APA Fox, Eric W.,Hill, Ryan A.,Leibowitz, Scott G.,Olsen, Anthony R.,Thornbrugh, Darren J.,&Weber, Marc H..(2017).Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology.ENVIRONMENTAL MONITORING AND ASSESSMENT,189(7).
MLA Fox, Eric W.,et al."Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology".ENVIRONMENTAL MONITORING AND ASSESSMENT 189.7(2017).
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Fox, Eric W.]的文章
[Hill, Ryan A.]的文章
[Leibowitz, Scott G.]的文章
百度学术
百度学术中相似的文章
[Fox, Eric W.]的文章
[Hill, Ryan A.]的文章
[Leibowitz, Scott G.]的文章
必应学术
必应学术中相似的文章
[Fox, Eric W.]的文章
[Hill, Ryan A.]的文章
[Leibowitz, Scott G.]的文章
相关权益政策
暂无数据
收藏/分享

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。