CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine
Tóm tắt
Từ khóa
Tài liệu tham khảo
Eddy, 2001, Non-coding RNA genes and the modern RNA world, Nat. Rev. Genet, 2, 919, 10.1038/35103511
Hatzigeorgiou, 2001, DIANA-EST: a statistical analysis, Bioinformatics, 17, 913, 10.1093/bioinformatics/17.10.913
Lottaz, 2003, Modeling sequencing errors by combining Hidden Markov models, Bioinformatics, 19, II103, 10.1093/bioinformatics/btg1067
Shafer, 2006, EST2Prot: mapping EST sequences to proteins, BMC Genomics, 7, 41, 10.1186/1471-2164-7-41
Carninci, 2003, Targeting a complex transcriptome: the construction of the mouse full-length cDNA encyclopedia, Genome Res, 13, 1273, 10.1101/gr.1119703
Carninci, 2005, The transcriptional landscape of the mammalian genome, Science, 309, 1559, 10.1126/science.1112014
Maeda, 2006, Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs, PLoS Genet, 2, e62, 10.1371/journal.pgen.0020062
Frith, 2006, Discrimination of non-protein-coding transcripts from protein-coding mRNA, RNA Biol, 3, 40, 10.4161/rna.3.1.2789
Liu, 2006, Distinguishing protein-coding from non-coding RNAs through support vector machines, PLoS Genet, 2, e29, 10.1371/journal.pgen.0020029
Slater, 2000, Algorithms for the Analysis of Expressed Sequence Tags
Nagaraj, 2006, A hitchhiker's guide to expressed sequence tag (EST) analysis, Brief Bioinform, 8, 6, 10.1093/bib/bbl015
Altschul, 1997, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res, 25, 3389, 10.1093/nar/25.17.3389
Wu, 2006, The Universal Protein Resource (UniProt): an expanding universe of protein information, Nucleic Acids Res, 34, D187, 10.1093/nar/gkj161
Witten, 2005, Data Mining: Practical Machine Learning Tools and Techniques
Furey, 2000, Support vector machine classification and validation of cancer tissue samples using microarray expression data, Bioinformatics, 16, 906, 10.1093/bioinformatics/16.10.906
Brown, 2000, Knowledge-based analysis of microarray gene expression data by using support vector machines, Proc. Natl Acad. Sci. USA, 97, 262, 10.1073/pnas.97.1.262
Petrova, 2006, Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties, BMC Bioinformatics, 7, 312, 10.1186/1471-2105-7-312
Borgwardt, 2005, Protein function prediction via graph kernels, Bioinformatics, 21, i47, 10.1093/bioinformatics/bti1007
Lei, 2005, An SVM-based system for predicting protein subnuclear localizations, BMC Bioinformatics, 6, 291, 10.1186/1471-2105-6-291
Chang CC Lin CJ 2001 Vol. 80 604 611 Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm
Griffiths-Jones, 2005, Rfam: annotating non-coding RNAs in complete genomes, Nucleic Acids Res, 33, D121, 10.1093/nar/gki081
Pang, 2005, RNAdb–a comprehensive mammalian noncoding RNA database, Nucleic Acids Res, 33, D125, 10.1093/nar/gki089
Cochrane, 2006, EMBL Nucleotide Sequence Database: developments in 2005, Nucleic Acids Res, 34, D10, 10.1093/nar/gkj130
Letunic, 2006, SMART 5: domains in the context of genomes and networks, Nucleic Acids Res, 34, D257, 10.1093/nar/gkj079
Madera, 2004, The SUPERFAMILY database in 2004: additions and improvements, Nucleic Acids Res, 32, D235, 10.1093/nar/gkh117
