Classification and clustering of sequencing data using a Poisson model

Annals of Applied Statistics - Tập 5 Số 4 - 2011

Daniela Witten¹

¹University of Washington

Tóm tắt

Từ khóa

Tài liệu tham khảo

Marioni, J. C., Mason, C. E., Mane, S. M., Stephens, M. and Gilad, Y. (2008). RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18 1509–1517.

Robinson, M. D., McCarthy, D. J. and Smyth, G. K. (2010). edgeR: A bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26 139–140.

Anders, S. and Huber, W. (2010). Differential expression analysis for sequence count data. Genome Biol. 11 R106.

Lee, S., Huang, J. Z. and Hu, J. (2010). Sparse logistic principal components analysis for binary data. Ann. Appl. Stat. 4 1579–1601.

Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. J. Amer. Statist. Assoc. 66 846–850.

Wang, Z., Gerstein, M. and Snyder, M. (2009). RNA-Seq: A revolutionary tool for transcriptomics. Nat. Rev. Genet. 10 57–63.

Tibshirani, R., Hastie, T., Narasimhan, B. and Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. USA 99 6567–6572.

Anscombe, F. J. (1948). The transformation of Poisson, binomial and negative-binomial data. Biometrika 35 246–254.

Johnson, D. S., Mortazavi, A., Myers, R. M. and Wold, B. (2007). Genome-wide mapping of in vivo protein-DNA interactions. Science 316 1497–1502.

Robinson, M. D. and Oshlack, A. (2010). A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11 R25.

Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. and Wold, B. (2008). Mapping and quantifying mammalian transcriptomes by RNA-seq. Nature Methods 5 621–628.

Auer, P. L. and Doerge, R. W. (2010). Statistical design and analysis of RNA sequencing data. Genetics 185 405–416.

Barrett, T., Suzek, T. O., Troup, D. B., Wilhite, S. E., Ngau, W.-C., Ledoux, P., Rudnev, D., Lash, A. E., Fujibuchi, W. and Edgar, R. (2005). NCBI GEO: Mining millions of expression profiles–database and tools. Nucleic Acids Res. 33 D562–D566.

Berninger, P., Gaidatzis, D., van Nimwegen, E. and Zavolan, M. (2008). Computational analysis of small RNA cloning data. Methods 44 13–21.

Bickel, P. J. and Levina, E. (2004). Some theory of Fisher’s linear discriminant function, ‘naive Bayes’, and some alternatives when there are many more variables than observations. Bernoulli 10 989–1010.

Brown, P. and Botstein, D. (1999). Exploring the new world of the genome with DNA microarrays. Nature Genetics 21 33–37.

Bullard, J. H., Purdom, E., Hansen, K. D. and Dudoit, S. (2010). Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11 94.

Cai, L., Huang, H., Blackshaw, S., Liu, J., Cepko, C. and Wong, W. (2004). Clustering analysis of SAGE data using a Poisson approach. Genome Biology 5 R51.

DeRisi, J., Iyer, V. and Brown, P. (1997). Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278 680–686.

Dudoit, S., Fridlyand, J. and Speed, T. P. (2001). Comparison of discrimination methods for the classification of tumors using gene expression data. J. Amer. Statist. Assoc. 96 1151–1160.

Kasowski, M., Grubert, F., Heffelfinger, C., Hariharan, M., Asabere, A., Waszak, S. M., Habegger, L., Rozowsky, J., Shi, M., Urban, A. E., Hong, M.-Y., Karczewski, K. J., Huber, W., Weissman, S. M., Gerstein, M. B., Korbel, J. O. and Snyder, M. (2010). Variation in transcription factor binding among humans. Science 328 232–235.

Linsen, S. E. V., de Wit, E., Janssens, G., Heater, S., Chapman, L., Parkin, R. K., Fritz, B., Wyman, S. K., de Bruijn, E., Voest, E. E., Kuersten, S., Tewari, M. and Cuppen, E. (2009). Limitations and possibilities of small RNA digital gene expression profiling. Nature Methods 6 474–476.

Monti, S., Savage, K. J., Kutok, J. L., Feuerhake, F., Kurtin, P., Mihm, M., Wu, B., Pasqualucci, L., Neuberg, D., Aguiar, R. C. T., Cin, P. D., Ladd, C., Pinkus, G. S., Salles, G., Harris, N. L., Dalla-Favera, R., Habermann, T. M., Aster, J. C., Golub, T. R. and Shipp, M. A. (2005). Molecular profiling of diffuse large B-cell lymphoma identifies robust subtypes including one characterized by host inflammatory response. Blood 105 1851–1861.

Morozova, O., Hirst, M. and Marra, M. A. (2009). Applications of new sequencing technologies for transcriptome analysis. Annu. Rev. Genomics Hum. Genet. 10 135–151.

Nagalakshmi, U., Wong, Z., Waern, K., Shou, C., Raha, D., Gerstein, M. and Snyder, M. (2008). The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 302 1344–1349.

Nielsen, T., West, R., Linn, S., Alter, O., Knowling, M., O’Connell, J. S. Z., Fero, M., Sherlock, G., Pollack, J., Brown, P., Botstein, D. and van de Rijn, M. (2002). Molecular characterisation of soft tissue tumours: A gene expression study. The Lancet 359 1301–1307.

Oshlack, A., Robinson, M. and Young, M. (2010). From RNA-seq reads to differential expression results. Genome Biology 11 220.

Oshlack, A. and Wakefield, M. (2009). Transcript length bias in RNA-seq data confounds system biology. Biology Direct 4 14.

Pepke, S., Wold, B. and Mortazavi, A. (2009). Computation for ChIP-seq and RNA-seq studies. Nature Methods 6 S22–S32.

Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J., Poggio, T., Gerald, W., Loda, M., Lander, E. and Golub, T. (2001). Multiclass cancer diagnosis using tumor gene expression signature. PNAS 98 15149–15154.

Spellman, P. T., Sherlock, G., Iyer, V. R., Zhang, M., Anders, K., Eisen, M. B., Brown, P. O., Botstein, D. and Futcher, B. (1998). Comprehensive identification of cell cycle-reulated genes of the yeast saccharomyces by microarray hybridization. Mol. Cell. Biol. 9 3273–3975.

Tibshirani, R., Hastie, T., Narasimhan, B. and Chu, G. (2003). Class prediction by nearest shrunken centroids, with applications to DNA microarrays. Statist. Sci. 18 104–117.

Wang, S. M. (2007). Understanding SAGE data. Trends Genet. 23 42–50.

Wilhelm, B. T. and Landry, J.-R. (2009). RNA-Seq-quantitative measurement of expression through massively parallel RNA-sequencing. Methods 48 249–257.

Witten, D. and Tibshirani, R. (2011). Penalized classification using Fisher’s linear discriminant. J. Roy. Statist. Soc. Ser. B 73 753–772.

Witten, D., Tibshirani, R., Gu, S., Fire, A. and Lui, W. (2010). Ultra-high throughput sequencing-based small RNA discovery and discrete statistical biomarker analysis in a collection of cervical tumous and matched controls. BMC Biology 8 58.

Agresti, A. (2002). Categorical Data Analysis. Wiley, Hoboken, NJ.

Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York.

Li, J., Witten, D., Johnstone, I. and Tibshirani, R. (2011). Normalization, testing, and false discovery rate estimation for RNA-sequencing data. Biostatistics. To appear.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích ảnh hưởng của các bài báo, công bố khoa học Việt Nam và Quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ SciBase

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Hệ thống hội thảo khoa học Việt Nam

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA

Thông tin liên hệ & hỗ trợ

Đơn vị chủ quản, phát triển và vận hành: Công ty Cổ phần Metis

Địa chỉ liên hệ: 26A Lê Đức Thọ, Phường Từ Liêm, Thành phố Hà Nội

Số giấy chứng nhận ĐKKD: 0109293202 cấp ngày 03/08/2020 tại Sở Kế hoạch và Đầu tư thành phố Hà Nội

Người quản lý và chịu trách nhiệm nội dung: Nguyễn Ngọc Sơn

Hotline: 0566.685.688

Email: [email protected]