Computational analysis of human protein interaction networks

Proteomics - Tập 7 Số 15 - Trang 2541-2552 - 2007
Fidel Ramírez1, Andreas Schlicker2, Yassen Assenov3, Thomas Lengauer2, Mario Albrecht2
1Department of Computational Biology and Applied Algorithmics, Max-Planck-Institute for Informatics, Saarbrücken, Germany
2Computational Biology and Applied Algorithmics, MPI for Informatics, Max Planck Society
3International Max Planck Research School, MPI for Informatics, Max Planck Society

Tóm tắt

Abstract

Large amounts of human protein interaction data have been produced by experiments and prediction methods. However, the experimental coverage of the human interactome is still low in contrast to predicted data. To gain insight into the value of publicly available human protein network data, we compared predicted datasets, high‐throughput results from yeast two‐hybrid screens, and literature‐curated protein‐protein interactions. This evaluation is not only important for further methodological improvements, but also for increasing the confidence in functional hypotheses derived from predictions. Therefore, we assessed the quality and the potential bias of the different datasets using functional similarity based on the Gene Ontology, structural iPfam domain‐domain interactions, likelihood ratios, and topological network parameters. This analysis revealed major differences between predicted datasets, but some of them also scored at least as high as the experimental ones regarding multiple quality measures. Therefore, since only small pair wise overlap between most datasets is observed, they may be combined to enlarge the available human interactome data. For this purpose, we additionally studied the influence of protein length on data quality and the number of disease proteins covered by each dataset. We could further demonstrate that protein interactions predicted by more than one method achieve an elevated reliability.

Từ khóa


Tài liệu tham khảo

10.1093/hmg/ddi335

10.1126/science.1116804

10.1016/j.sbi.2004.05.003

10.1186/gb-2005-6-5-r40

10.1016/S0014-5793(01)03293-8

10.1093/nar/gkh086

10.1093/nar/gki051

10.1093/nar/gkj141

10.1093/nar/gkl958

10.1038/ng1747

10.1093/bioinformatics/bth366

10.1186/gb-2004-5-9-r63

10.1093/bioinformatics/bti273

10.1093/bioinformatics/bti514

10.1038/nbt1103

10.1186/1471-2105-6-S4-S21

10.1038/nature04209

10.1016/j.cell.2005.08.029

10.1101/gr.206701

10.1038/nbt1002-991

10.1074/mcp.M100037-MCP200

10.1016/S0168-9525(02)02763-4

10.1038/nature750

10.1016/S0022-2836(03)00239-0

10.1038/nbt924

10.1186/jbiol36

10.1101/gr.1774904

10.1093/nar/gkj133

10.1093/nar/gki031

10.1093/nar/gkj157

10.1093/nar/gki025

10.1093/nar/gkj161

10.1186/1471-2105-7-302

10.1093/nar/gkh066

10.1093/bib/3.3.285

10.1038/ng776

10.1093/bioinformatics/bti011

10.1093/bioinformatics/bti1135

10.1126/science.1087361

10.1101/gr.1239303

10.1093/nar/gki033

10.1126/science.287.5450.116

Hulsen T., 2006, Genome Biol., 7, R31, 10.1186/gb-2006-7-4-r31

10.1038/35001009

10.1073/pnas.061034498

10.1038/415141a

10.1038/415180a

10.1126/science.1091403

10.1126/science.1090289

10.1093/nar/25.17.3389

10.1093/nar/gki107

10.1093/nar/28.1.235

10.1371/journal.pcbi.0020079

10.1186/gb-2006-7-6-223

10.1093/bioinformatics/bti1016

10.1093/nar/gkl219

10.1093/bioinformatics/btl042

10.1126/science.1083653

10.1038/nrg1272

10.1126/science.286.5439.509