CCPortal
DOI10.1021/ci400527b
Binary Classification of a Large Collection of Environmental Chemicals from Estrogen Receptor Assays by Quantitative Structure-Activity Relationship and Machine Learning Methods
Zang, Qingda1; Rotroff, Daniel M.2,3; Judson, Richard S.2
发表日期2013-12-01
ISSN1549-9596
卷号53期号:12页码:3244-3261
英文摘要

There are thousands of environmental chemicals subject to regulatory decisions for endocrine disrupting potential. The ToxCast and Tox21 programs have tested similar to 8200 chemicals in a broad screening panel of in vitro high-throughput screening (HTS) assays for estrogen receptor (ER) agonist and antagonist activity. The present work uses this large data set to develop in silico quantitative structure-activity relationship (QSAR) models using machine learning (ML) methods and a novel approach to manage the imbalanced data distribution. Training compounds from the ToxCast project were categorized as active or inactive (binding or nonbinding) classes based on a composite ER Interaction Score derived from a collection of 13 ER in vitro assays. A total of 1537 chemicals from ToxCast were used to derive and optimize the binary classification models while 5073 additional chemicals from the Tox21 project, evaluated in 2 of the 13 in vitro assays, were used to externally validate the model performance. In order to handle the imbalanced distribution of active and inactive chemicals, we developed a cluster-selection strategy to minimize information loss and increase predictive performance and compared this strategy to three currently popular techniques: cost-sensitive learning, oversampling of the minority class, and undersampling of the majority class. QSAR classification models were built to relate the molecular structures of chemicals to their ER activities using linear discriminant analysis (LDA), classification and regression trees (CART), and support vector machines (SVM) with 51 molecular descriptors from QikProp and 4328 bits of structural fingerprints as explanatory variables. A random forest (RF) feature selection method was employed to extract the structural features most relevant to the ER activity. The best model was obtained using SVM in combination with a subset of descriptors identified from a large set via the RF algorithm, which recognized the active and inactive compounds at the accuracies of 76.1% and 82.8% with a total accuracy of 81.6% on the internal test set and 70.8% on the external test set. These results demonstrate that a combination of high-quality experimental data and ML methods can lead to robust models that achieve excellent predictive accuracy, which are potentially useful for facilitating the virtual screening of chemicals for environmental risk assessment.


语种英语
WOS记录号WOS:000329137700015
来源期刊JOURNAL OF CHEMICAL INFORMATION AND MODELING
来源机构美国环保署
文献类型期刊论文
条目标识符http://gcip.llas.ac.cn/handle/2XKMVOVA/59965
作者单位1.US EPA, ORISE, Res Triangle Pk, NC 27711 USA;
2.US EPA, Natl Ctr Computat Toxicol, Res Triangle Pk, NC 27711 USA;
3.N Carolina State Univ, Dept Stat, Bioinformat Res Ctr, Raleigh, NC 27695 USA
推荐引用方式
GB/T 7714
Zang, Qingda,Rotroff, Daniel M.,Judson, Richard S.. Binary Classification of a Large Collection of Environmental Chemicals from Estrogen Receptor Assays by Quantitative Structure-Activity Relationship and Machine Learning Methods[J]. 美国环保署,2013,53(12):3244-3261.
APA Zang, Qingda,Rotroff, Daniel M.,&Judson, Richard S..(2013).Binary Classification of a Large Collection of Environmental Chemicals from Estrogen Receptor Assays by Quantitative Structure-Activity Relationship and Machine Learning Methods.JOURNAL OF CHEMICAL INFORMATION AND MODELING,53(12),3244-3261.
MLA Zang, Qingda,et al."Binary Classification of a Large Collection of Environmental Chemicals from Estrogen Receptor Assays by Quantitative Structure-Activity Relationship and Machine Learning Methods".JOURNAL OF CHEMICAL INFORMATION AND MODELING 53.12(2013):3244-3261.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Zang, Qingda]的文章
[Rotroff, Daniel M.]的文章
[Judson, Richard S.]的文章
百度学术
百度学术中相似的文章
[Zang, Qingda]的文章
[Rotroff, Daniel M.]的文章
[Judson, Richard S.]的文章
必应学术
必应学术中相似的文章
[Zang, Qingda]的文章
[Rotroff, Daniel M.]的文章
[Judson, Richard S.]的文章
相关权益政策
暂无数据
收藏/分享

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。