CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine

Nucleic Acids Research - Tập 35 Số suppl_2 - Trang W345-W349 - 2007
Lei Kong1, Huanming Yang2, Zhiqiang Ye2, Xiao Liu2, Shuqi Zhao2, Liping Wei2, Ge Gao2
1Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing, 100871, PR China
2Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing, 100871, P.R. China

Tóm tắt

Từ khóa


Tài liệu tham khảo

Eddy, 2001, Non-coding RNA genes and the modern RNA world, Nat. Rev. Genet, 2, 919, 10.1038/35103511

Mattick, 2004, RNA regulation: a new genetics?, Nat. Rev. Genet, 5, 316, 10.1038/nrg1321

Mattick, 2006, Non-coding RNA, Hum. Mol. Genet, 15, R17, 10.1093/hmg/ddl046

Furuno, 2003, CDS annotation in full-length cDNA sequence, Genome Res, 13, 1478, 10.1101/gr.1060303

Hatzigeorgiou, 2001, DIANA-EST: a statistical analysis, Bioinformatics, 17, 913, 10.1093/bioinformatics/17.10.913

Lottaz, 2003, Modeling sequencing errors by combining Hidden Markov models, Bioinformatics, 19, II103, 10.1093/bioinformatics/btg1067

Shafer, 2006, EST2Prot: mapping EST sequences to proteins, BMC Genomics, 7, 41, 10.1186/1471-2164-7-41

Carninci, 2003, Targeting a complex transcriptome: the construction of the mouse full-length cDNA encyclopedia, Genome Res, 13, 1273, 10.1101/gr.1119703

Okazaki, 2003, A Guide to the Mammalian Genome, Genome Res, 13, 1267, 10.1101/gr.1445603

Carninci, 2005, The transcriptional landscape of the mammalian genome, Science, 309, 1559, 10.1126/science.1112014

Maeda, 2006, Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs, PLoS Genet, 2, e62, 10.1371/journal.pgen.0020062

Frith, 2006, Discrimination of non-protein-coding transcripts from protein-coding mRNA, RNA Biol, 3, 40, 10.4161/rna.3.1.2789

Liu, 2006, Distinguishing protein-coding from non-coding RNAs through support vector machines, PLoS Genet, 2, e29, 10.1371/journal.pgen.0020029

Slater, 2000, Algorithms for the Analysis of Expressed Sequence Tags

Nagaraj, 2006, A hitchhiker's guide to expressed sequence tag (EST) analysis, Brief Bioinform, 8, 6, 10.1093/bib/bbl015

Altschul, 1997, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res, 25, 3389, 10.1093/nar/25.17.3389

Wu, 2006, The Universal Protein Resource (UniProt): an expanding universe of protein information, Nucleic Acids Res, 34, D187, 10.1093/nar/gkj161

Witten, 2005, Data Mining: Practical Machine Learning Tools and Techniques

Furey, 2000, Support vector machine classification and validation of cancer tissue samples using microarray expression data, Bioinformatics, 16, 906, 10.1093/bioinformatics/16.10.906

Brown, 2000, Knowledge-based analysis of microarray gene expression data by using support vector machines, Proc. Natl Acad. Sci. USA, 97, 262, 10.1073/pnas.97.1.262

Petrova, 2006, Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties, BMC Bioinformatics, 7, 312, 10.1186/1471-2105-7-312

Borgwardt, 2005, Protein function prediction via graph kernels, Bioinformatics, 21, i47, 10.1093/bioinformatics/bti1007

Yu, 2006, Prediction of protein subcellular localization, Proteins, 64, 643, 10.1002/prot.21018

Lei, 2005, An SVM-based system for predicting protein subnuclear localizations, BMC Bioinformatics, 6, 291, 10.1186/1471-2105-6-291

Chang CC Lin CJ 2001 Vol. 80 604 611 Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm

Griffiths-Jones, 2005, Rfam: annotating non-coding RNAs in complete genomes, Nucleic Acids Res, 33, D121, 10.1093/nar/gki081

Pang, 2005, RNAdb–a comprehensive mammalian noncoding RNA database, Nucleic Acids Res, 33, D125, 10.1093/nar/gki089

Cochrane, 2006, EMBL Nucleotide Sequence Database: developments in 2005, Nucleic Acids Res, 34, D10, 10.1093/nar/gkj130

Bateman, 2004, The Pfam protein families database, Nucleic Acids Res, 32, D138, 10.1093/nar/gkh121

Letunic, 2006, SMART 5: domains in the context of genomes and networks, Nucleic Acids Res, 34, D257, 10.1093/nar/gkj079

Madera, 2004, The SUPERFAMILY database in 2004: additions and improvements, Nucleic Acids Res, 32, D235, 10.1093/nar/gkh117

Mignone, 2005, UTRdb and UTRsite: a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs, Nucleic Acids Res, 33, D141, 10.1093/nar/gki021