Classification and clustering of sequencing data using a Poisson model
Tóm tắt
Từ khóa
Tài liệu tham khảo
Marioni, J. C., Mason, C. E., Mane, S. M., Stephens, M. and Gilad, Y. (2008). RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. <i>Genome Res.</i> <b>18</b> 1509–1517.
Robinson, M. D., McCarthy, D. J. and Smyth, G. K. (2010). edgeR: A bioconductor package for differential expression analysis of digital gene expression data. <i>Bioinformatics</i> <b>26</b> 139–140.
Anders, S. and Huber, W. (2010). Differential expression analysis for sequence count data. <i>Genome Biol.</i> <b>11</b> R106.
Lee, S., Huang, J. Z. and Hu, J. (2010). Sparse logistic principal components analysis for binary data. <i>Ann. Appl. Stat.</i> <b>4</b> 1579–1601.
Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. <i>J. Amer. Statist. Assoc.</i> <b>66</b> 846–850.
Wang, Z., Gerstein, M. and Snyder, M. (2009). RNA-Seq: A revolutionary tool for transcriptomics. <i>Nat. Rev. Genet.</i> <b>10</b> 57–63.
Tibshirani, R., Hastie, T., Narasimhan, B. and Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. <i>Proc. Natl. Acad. Sci. USA</i> <b>99</b> 6567–6572.
Anscombe, F. J. (1948). The transformation of Poisson, binomial and negative-binomial data. <i>Biometrika</i> <b>35</b> 246–254.
Johnson, D. S., Mortazavi, A., Myers, R. M. and Wold, B. (2007). Genome-wide mapping of in vivo protein-DNA interactions. <i>Science</i> <b>316</b> 1497–1502.
Robinson, M. D. and Oshlack, A. (2010). A scaling normalization method for differential expression analysis of RNA-seq data. <i>Genome Biol.</i> <b>11</b> R25.
Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. and Wold, B. (2008). Mapping and quantifying mammalian transcriptomes by RNA-seq. <i>Nature Methods</i> <b>5</b> 621–628.
Auer, P. L. and Doerge, R. W. (2010). Statistical design and analysis of RNA sequencing data. <i>Genetics</i> <b>185</b> 405–416.
Barrett, T., Suzek, T. O., Troup, D. B., Wilhite, S. E., Ngau, W.-C., Ledoux, P., Rudnev, D., Lash, A. E., Fujibuchi, W. and Edgar, R. (2005). NCBI GEO: Mining millions of expression profiles–database and tools. <i>Nucleic Acids Res.</i> <b>33</b> D562–D566.
Berninger, P., Gaidatzis, D., van Nimwegen, E. and Zavolan, M. (2008). Computational analysis of small RNA cloning data. <i>Methods</i> <b>44</b> 13–21.
Bickel, P. J. and Levina, E. (2004). Some theory of Fisher’s linear discriminant function, ‘naive Bayes’, and some alternatives when there are many more variables than observations. <i>Bernoulli</i> <b>10</b> 989–1010.
Brown, P. and Botstein, D. (1999). Exploring the new world of the genome with DNA microarrays. <i>Nature Genetics</i> <b>21</b> 33–37.
Bullard, J. H., Purdom, E., Hansen, K. D. and Dudoit, S. (2010). Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. <i>BMC Bioinformatics</i> <b>11</b> 94.
Cai, L., Huang, H., Blackshaw, S., Liu, J., Cepko, C. and Wong, W. (2004). Clustering analysis of SAGE data using a Poisson approach. <i>Genome Biology</i> <b>5</b> R51.
DeRisi, J., Iyer, V. and Brown, P. (1997). Exploring the metabolic and genetic control of gene expression on a genomic scale. <i>Science</i> <b>278</b> 680–686.
Dudoit, S., Fridlyand, J. and Speed, T. P. (2001). Comparison of discrimination methods for the classification of tumors using gene expression data. <i>J. Amer. Statist. Assoc.</i> <b>96</b> 1151–1160.
Kasowski, M., Grubert, F., Heffelfinger, C., Hariharan, M., Asabere, A., Waszak, S. M., Habegger, L., Rozowsky, J., Shi, M., Urban, A. E., Hong, M.-Y., Karczewski, K. J., Huber, W., Weissman, S. M., Gerstein, M. B., Korbel, J. O. and Snyder, M. (2010). Variation in transcription factor binding among humans. <i>Science</i> <b>328</b> 232–235.
Linsen, S. E. V., de Wit, E., Janssens, G., Heater, S., Chapman, L., Parkin, R. K., Fritz, B., Wyman, S. K., de Bruijn, E., Voest, E. E., Kuersten, S., Tewari, M. and Cuppen, E. (2009). Limitations and possibilities of small RNA digital gene expression profiling. <i>Nature Methods</i> <b>6</b> 474–476.
Monti, S., Savage, K. J., Kutok, J. L., Feuerhake, F., Kurtin, P., Mihm, M., Wu, B., Pasqualucci, L., Neuberg, D., Aguiar, R. C. T., Cin, P. D., Ladd, C., Pinkus, G. S., Salles, G., Harris, N. L., Dalla-Favera, R., Habermann, T. M., Aster, J. C., Golub, T. R. and Shipp, M. A. (2005). Molecular profiling of diffuse large B-cell lymphoma identifies robust subtypes including one characterized by host inflammatory response. <i>Blood</i> <b>105</b> 1851–1861.
Morozova, O., Hirst, M. and Marra, M. A. (2009). Applications of new sequencing technologies for transcriptome analysis. <i>Annu. Rev. Genomics Hum. Genet.</i> <b>10</b> 135–151.
Nagalakshmi, U., Wong, Z., Waern, K., Shou, C., Raha, D., Gerstein, M. and Snyder, M. (2008). The transcriptional landscape of the yeast genome defined by RNA sequencing. <i>Science</i> <b>302</b> 1344–1349.
Nielsen, T., West, R., Linn, S., Alter, O., Knowling, M., O’Connell, J. S. Z., Fero, M., Sherlock, G., Pollack, J., Brown, P., Botstein, D. and van de Rijn, M. (2002). Molecular characterisation of soft tissue tumours: A gene expression study. <i>The Lancet</i> <b>359</b> 1301–1307.
Oshlack, A., Robinson, M. and Young, M. (2010). From RNA-seq reads to differential expression results. <i>Genome Biology</i> <b>11</b> 220.
Oshlack, A. and Wakefield, M. (2009). Transcript length bias in RNA-seq data confounds system biology. <i>Biology Direct</i> <b>4</b> 14.
Pepke, S., Wold, B. and Mortazavi, A. (2009). Computation for ChIP-seq and RNA-seq studies. <i>Nature Methods</i> <b>6</b> S22–S32.
Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J., Poggio, T., Gerald, W., Loda, M., Lander, E. and Golub, T. (2001). Multiclass cancer diagnosis using tumor gene expression signature. <i>PNAS</i> <b>98</b> 15149–15154.
Spellman, P. T., Sherlock, G., Iyer, V. R., Zhang, M., Anders, K., Eisen, M. B., Brown, P. O., Botstein, D. and Futcher, B. (1998). Comprehensive identification of cell cycle-reulated genes of the yeast saccharomyces by microarray hybridization. <i>Mol. Cell. Biol.</i> <b>9</b> 3273–3975.
Tibshirani, R., Hastie, T., Narasimhan, B. and Chu, G. (2003). Class prediction by nearest shrunken centroids, with applications to DNA microarrays. <i>Statist. Sci.</i> <b>18</b> 104–117.
Wilhelm, B. T. and Landry, J.-R. (2009). RNA-Seq-quantitative measurement of expression through massively parallel RNA-sequencing. <i>Methods</i> <b>48</b> 249–257.
Witten, D. and Tibshirani, R. (2011). Penalized classification using Fisher’s linear discriminant. <i>J. Roy. Statist. Soc. Ser. B</i> <b>73</b> 753–772.
Witten, D., Tibshirani, R., Gu, S., Fire, A. and Lui, W. (2010). Ultra-high throughput sequencing-based small RNA discovery and discrete statistical biomarker analysis in a collection of cervical tumous and matched controls. <i>BMC Biology</i> <b>8</b> 58.
Hastie, T., Tibshirani, R. and Friedman, J. (2009). <i>The Elements of Statistical Learning</i>: <i>Data Mining, Inference, and Prediction</i>. Springer, New York.