A posteriori quality control for the curation and reuse of public proteomics data

Proteomics - Tập 11 Số 11 - Trang 2182-2194 - 2011

Joseph Foster¹, Sven Degroeve^2,3, Laurent Gatto⁴, Matthieu Visser⁵, Rui Wang¹, Johannes Griss⁶, Rolf Apweiler¹, Lennart Martens^2,3

¹EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK

²Department of Biochemistry, Ghent University, Ghent, Belgium

³Department of Medical Protein Research, VIB, Ghent, Belgium

⁴Cambridge Centre for Proteomics, Cambridge Systems Biology Centre, Department of Biochemistry, University of Cambridge, Cambridge, UK

⁵Philips Research Laboratories, Cambridge Science Park, Cambridge, UK

⁶Department of Medicine, Vienna General Hospital, Medical University Vienna, Vienna, Austria

Tóm tắt

AbstractProteomics is a rapidly expanding field encompassing a multitude of complex techniques and data types. To date much effort has been devoted to achieving the highest possible coverage of proteomes with the aim to inform future developments in basic biology as well as in clinical settings. As a result, growing amounts of data have been deposited in publicly available proteomics databases. These data are in turn increasingly reused for orthogonal downstream purposes such as data mining and machine learning. These downstream uses however, need ways to a posteriori validate whether a particular data set is suitable for the envisioned purpose. Furthermore, the (semi‐)automatic curation of repository data is dependent on analyses that can highlight misannotation and edge conditions for data sets. Such curation is an important prerequisite for efficient proteomics data reuse in the life sciences in general. We therefore present here a selection of quality control metrics and approaches for the a posteriori detection of potential issues encountered in typical proteomics data sets. We illustrate our metrics by relying on publicly available data from the Proteomics Identifications Database (PRIDE), and simultaneously show the usefulness of the large body of PRIDE data as a means to derive empirical background distributions for relevant metrics.

Từ khóa

Tài liệu tham khảo

10.1126/science.321.5897.1758

10.1002/pmic.200500358

10.1038/nrm2900

10.1021/pr1003888

10.1038/nbt1235

10.1021/pr049882h

10.1186/gb-2004-6-1-r9

10.1002/pmic.200401303

10.1038/nbt0709-600

10.1021/pr070461k

10.1002/pmic.200700761

10.1038/nbt1275

10.1021/pr900010h

10.1038/nmeth.1254

10.1021/pr9006365

10.1074/mcp.M900223-MCP200

10.1074/mcp.M800008-MCP200

10.1074/mcp.M600068-MCP200

10.1038/nmeth.1333

10.1038/nbt.1511

10.1021/pr025517j

10.1021/ac0256991

10.1016/j.molonc.2008.12.001

10.1002/pmic.200900402

10.1002/rcm.4467

10.1016/j.aca.2008.04.043

10.1074/mcp.M400129-MCP200

10.1021/ac0262560

10.1021/pr1003856

10.1021/pr060550h

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích ảnh hưởng của các bài báo, công bố khoa học Việt Nam và Quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ SciBase

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Hệ thống hội thảo khoa học Việt Nam

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA

Thông tin liên hệ & hỗ trợ

Đơn vị chủ quản, phát triển và vận hành: Công ty Cổ phần Metis

Địa chỉ liên hệ: 26A Lê Đức Thọ, Phường Từ Liêm, Thành phố Hà Nội

Số giấy chứng nhận ĐKKD: 0109293202 cấp ngày 03/08/2020 tại Sở Kế hoạch và Đầu tư thành phố Hà Nội

Người quản lý và chịu trách nhiệm nội dung: Nguyễn Ngọc Sơn

Hotline: 0566.685.688

Email: [email protected]