Data governance in predictive toxicology: A review

Springer Science and Business Media LLC - Tập 3 - Trang 1-16 - 2011
Xin Fu1,2, Anna Wojak1, Daniel Neagu1, Mick Ridley1, Kim Travis2
1School of Computing, Informatics and Media, Bradford, UK
2Syngenta Ltd., Jealott’s Hill International Research Centre, Bracknell, UK

Tóm tắt

Due to recent advances in data storage and sharing for further data processing in predictive toxicology, there is an increasing need for flexible data representations, secure and consistent data curation and automated data quality checking. Toxicity prediction involves multidisciplinary data. There are hundreds of collections of chemical, biological and toxicological data that are widely dispersed, mostly in the open literature, professional research bodies and commercial companies. In order to better manage and make full use of such large amount of toxicity data, there is a trend to develop functionalities aiming towards data governance in predictive toxicology to formalise a set of processes to guarantee high data quality and better data management. In this paper, data quality mainly refers in a data storage sense (e.g. accuracy, completeness and integrity) and not in a toxicological sense (e.g. the quality of experimental results). This paper reviews seven widely used predictive toxicology data sources and applications, with a particular focus on their data governance aspects, including: data accuracy, data completeness, data integrity, metadata and its management, data availability and data authorisation. This review reveals the current problems (e.g. lack of systematic and standard measures of data quality) and desirable needs (e.g. better management and further use of captured metadata and the development of flexible multi-level user access authorisation schemas) of predictive toxicology data sources development. The analytical results will help to address a significant gap in toxicology data quality assessment and lead to the development of novel frameworks for predictive toxicology data and model governance. While the discussed public data sources are well developed, there nevertheless remain some gaps in the development of a data governance framework to support predictive toxicology. In this paper, data governance is identified as the new challenge in predictive toxicology, and a good use of it may provide a promising framework for developing high quality and easy accessible toxicity data repositories. This paper also identifies important research directions that require further investigation in this area.

Tài liệu tham khảo

IBM Data Governance webpage. [http://www.ibm.com/ibm/servicemanagemnt/us/en/]

Data Governance Institute. [http://www.datagovernance.com/adg_data_governance_definition.html]

REACH. [http://ec.europa.eu/environment/chemicals/reach/reach_intro.htm]

Jaworska J, Comber M, Auer C, Leeuwen C: Summary of a workshop on regulatory acceptance of (Q)SARs for human health and environmental endpoints. Environ Health Perspectives. 2003, 111: 1358-1360. 10.1289/ehp.5757.

Jaworska J, Nikolova-Jelizkova N, Aldenerg T: QSAR Applicability Domain Estimation by Projection of the Training Set in Descriptor Space: A Review. ATLA Alternatives to laboratory animals. 2005, 33: 445-459.

OpenTox. [http://www.opentox.org]

Thomas G: The DGI Data Governance Framework. [http://www.datagovernance.com/adg_data_governance_definition.html]

Nendza M, Aldenberg T, Benfenati E, Benigni R, Cronin M, Escher S, Fernandez A, Gabbert S, Giralt F, Hewitt M, Hrovat M, Jeram S, Kroese D, Madden JC, Mangelsdorf I, Rallo R, Roncaglioni A, Rorije E, Segner H, Simon-Hettich B, Vermeire T: Chapter 4 Data Quality Assessment for In Silico Methods: A Survey of Approaches and Needs. In Silico Toxicology. 2010, Cambridge, UK: The Royal Society of Chemistry, 59-117.

Helma C, (Ed): Predictive Toxicology. 2005, FL, USA: Taylor & Francis Group

Tropsha A: Best Practices for QSAR Model Development, Validation, and Exploitation. Molecular Informatics. 2010, 29 (6-7): 476-488. 10.1002/minf.201000061.

Judson R: Public Databases Supporting Computational Toxicology. Journal of Toxicology and Environmental Health, Part B. 2010, 13 (2): 218-231. 10.1080/10937404.2010.483937.

Mattes WB, Pettit SD, Sansone SA, Bushel PR, Waters MD: Database development in toxicogenomics: issues and efforts. Environmental Health Perspects. 2004, 112 (4): 495-505. 10.1289/ehp.6697.

Pence HE, Williams A: ChemSpider: An Online Chemical Information Resource. Journal of Chemical Education. 2010, 87 (11): 1123-1124. 10.1021/ed100697w.

Waters M, Stasiewicz S, Merrick BA, Tomer K, Bushel P, Paules R, Stegman N, Nehls G, Yost KJ, Johnson CH, Gustafson SF, Xirasagar S, Xiao N, Huang CC, Boyer P, Chan DD, Pan Q, Gong H, Taylor J, Choi D, Rashid A, Ahmed A, Howle R, Selkirk J, Tennant R, Fostel J: Chemical Effects in Biological Systems: a public data repository integrating study design and toxicity data with microarray and proteomics dataCEBS. Nucleic Acids Research. 2008, 36 (suppl 1): D892-900.

Waters MD, Fostel JM: Toxicogenomics and systems toxicology: aims and prospects. Nature Review Genetics. 2004, 5 (12): 936-948. 10.1038/nrg1493.

ChemSpider. [http://www.chemspider.com/]

PubChem. [http://pubchem.ncbi.nlm.nih.gov/]

Williams A: ChemSpider and Its Expanding Web: Building a Structure-Centric Community for Chemists. Chemistry International. 2008, 30:

Chemical Entities of Biological. [http://www.ebi.ac.uk/chebi/]

Fostel JM, Burgoon L, Zwickl C, Lord P, Corton JC, Bushel PR, Cunningham M, Fan L, Edwards SW, Hester S, Stevens J, Tong W, Waters M, Yang C, Tennant R: Toward a Checklist for Exchange and Interpretation of Data from a Toxicology Study. Toxicological Sciences. 2007, 99: 26-34. 10.1093/toxsci/kfm090.

CEBS. [http://cebs.niehs.nih.gov]

Mattingly CJ, Rosenstein MC, Davis AP, Colby GT, Forrest JN, Boyer JL: The Comparative Toxicogenomics Database: A Cross-Species Resource for Building Chemical-Gene Interaction Networks. Toxicological Sciences. 2006, 92 (2): 587-595. 10.1093/toxsci/kfl008.

Comparative Toxicogenomics Database. [http://ctd.mdibl.org/]

Mattingly CJ, Colby GT, Forrest JN, Boyer JL: The Comparative Toxicogenomics Database (CTD). Environmental Health Perspectives. 2003, 111 (6): 793-795. 10.1289/ehp.6028.

DSSTox. [http://www.epa.gov/ncct/dsstox/]

GEO. [http://www.ncbi.nlm.nih.gov/geo/]

ToxCast. [http://www.epa.gov/ncct/toxcast/index.html]

ToxRefDB. [http://www.epa.gov/ncct/toxrefdb/]

ToxCastDB. [http://actor.epa.gov/actor/faces/ToxCastDB/Home.jsp]

OECD: OECD principles for the validation, for regulatory purposes, of QSAR models. [http://www.oecd.org/dataoecd/33/37/37849783.pdf]