Gigwa—Genotype investigator for genome-wide analyses

Oxford University Press (OUP) - Tập 5 - Trang 1-9 - 2016

Guilhem Sempéré^1,2, Florian Philippe³, Alexis Dereeper^2,4, Manuel Ruiz^2,5,6,7, Gautier Sarah^2,8, Pierre Larmande^2,3,6,9

¹UMR InterTryp (CIRAD), Campus International de Baillarguet, Montpellier, France

²South Green Bioinformatics Platform, Montpellier, France

³UMR DIADE, IRD, Montpellier, France

⁴UMR IPME, IRD, Montpellier, France

⁵UMR AGAP, CIRAD, Montpellier, France

⁶Institut de Biologie Computationnelle, Université de Montpellier, Montpellier, France

⁷Agrobiodiversity Research Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia

⁸INRA UMR AGAP, Montpellier, France

⁹INRIA Zenith Team, LIRMM, Montpellier, France

Tóm tắt

Exploring the structure of genomes and analyzing their evolution is essential to understanding the ecological adaptation of organisms. However, with the large amounts of data being produced by next-generation sequencing, computational challenges arise in terms of storage, search, sharing, analysis and visualization. This is particularly true with regards to studies of genomic variation, which are currently lacking scalable and user-friendly data exploration solutions. Here we present Gigwa, a web-based tool that provides an easy and intuitive way to explore large amounts of genotyping data by filtering it not only on the basis of variant features, including functional annotations, but also on genotype patterns. The data storage relies on MongoDB, which offers good scalability properties. Gigwa can handle multiple databases and may be deployed in either single- or multi-user mode. In addition, it provides a wide range of popular export formats. The Gigwa application is suitable for managing large amounts of genomic variation data. Its user-friendly web interface makes such processing widely accessible. It can either be simply deployed on a workstation or be used to provide a shared data portal for a given community of researchers.

Tài liệu tham khảo

Gheyas A, Boschiero C, Eory L, Ralph H, Kuo R, Woolliams J, et al. Functional classification of 15 million SNPs detected from diverse chicken populations. DNA Res. 2015;22(3):205–17. Li X, Buitenhuis AJ, Lund MS, Li C, Sun D, Zhang Q, et al. Joint genome-wide association study for milk fatty acid traits in Chinese and Danish Holstein populations. J Dairy Sci. 2015;98(11):8152–63. Available from: http://www.ncbi.nlm.nih.gov/pubmed/26364108. Shinada H, Yamamoto T, Sato H, Yamamoto E, Hori K, Yonemaru J, et al. Quantitative trait loci for rice blast resistance detected in a local rice breeding population by genome-wide association mapping. Breed Sci. 2015;65(5):388–95. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=4671699&tool=pmcentrez&rendertype=abstract. Marcotuli I, Houston K, Waugh R, Fincher GB, Burton RA, Blanco A, et al. Genome wide association mapping for arabinoxylan content in a collection of tetraploid wheats. PLoS One. 2015;10(7):e0132787. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=4503733&tool=pmcentrez&rendertype=abstract. The 3000 rice genomes project. The 3,000 rice genomes project. Gigascience. 2014; 3:7. http://dx.doi.org/10.1186/2047-217X-3-7 Ossowski S, Schneeberger K, Clark RM, Lanz C, Warthmann N, Weigel D. Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome Res. 2008;18:2024–33. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2593571&tool=pmcentrez&rendertype=abstract. Cao J, Schneeberger K, Ossowski S, Günther T, Bender S, Fitz J, et al. Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nat Genet. 2011;43(10):956–63. Available from: http://www.ncbi.nlm.nih.gov/pubmed/21874002. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–8. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3137218&tool=pmcentrez&rendertype=abstract. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–303. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2928508&tool=pmcentrez&rendertype=abstract. Casbon J. PyVCF - A Variant Call Format Parser for Python. 2012. Available from: https://pyvcf.readthedocs.org/en/latest/INTRO.html Obenchain V, Lawrence M, Carey V, Gogarten S, Shannon P, Morgan M. VariantAnnotation: a bioconductor package for exploration and annotation of genetic variants. Bioinformatics. 2014;30(14):2076–8. Wittelsburger U, Pfeifer B, Lercher MJ. WhopGenome: high-speed access to whole-genome variation and sequence data in R. Bioinformatics. 2015;31(3):413–5. Available from: http://bioinformatics.oxfordjournals.org/cgi/doi/10.1093/bioinformatics/btu636. Bach M, Werner A. In: Nawrat MAM, editor. Innovative control systems for tracked vehicle platforms, vol. 2. Cham: Springer International Publishing; 2014. p. 163–74. Available from: http://link.springer.com/10.1007/978-3-319-04624-2. Gajendran, S.K. A survey on NoDQL databases. University of Illinois; 2012. Available from: http://www.masters.dgtu.donetsk.ua/2013/fknt/babich/library/article10.pdf. Moniruzzaman ABM, Hossain SA. Nosql database: New era of databases for big data analytics-classification, characteristics and comparison. CoRR [Internet]. 2013;6(4):1–14. Available from: http://arxiv.org/abs/1307.0191. O’Connor BD, Merriman B, Nelson SF. SeqWare query engine: storing and searching sequence data in the cloud. BMC Bioinf. 2010;11(12):S2. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3040528&tool=pmcentrez&rendertype=abstract. Wang S, Pandis I, Wu C, He S, Johnson D, Emam I, et al. High dimensional biological data retrieval optimization with NoSQL technology. BMC Genomics. 2014;15(8):S3. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=4248814&tool=pmcentrez&rendertype=abstract. Langmead B, Schatz MC, Lin J, Pop M, Salzberg SL. Searching for SNPs with cloud computing. Genome Biol. 2009;10(11):R134. http://genomebiology.com/2009/10/11/R134. Afgan E, Chapman B, Taylor J. CloudMan as a platform for tool, data, and analysis distribution. BMC Bioinf. 2012;13(1):315. http://www.biomedcentral.com/1471-2105/13/315. Schatz MC. CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics. 2009;25(11):1363–9. Available from: http://bioinformatics.oxfordjournals.org/cgi/doi/10.1093/bioinformatics/btp236. Russ TA, Ramakrishnan C, Hovy EH, Bota M, Burns GAPC. Knowledge engineering tools for reasoning with scientific observations and interpretations: a neural connectivity use case. BMC Bioinf. 2011;12(1):351. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3176268&tool=pmcentrez&rendertype=abstract. Ye Z, Li S. Arequest skewaware heterogeneous distributed storage systembased on Cassandra. the International Conference on Computer and Management (CAMAN’11). 2011. p. 1–5. Manyam G, Payton M A, Roth J A, Abruzzo L V, Coombes KR. Relax with CouchDB - Into the non-relational DBMS era of bioinformatics. Genomics. Elsevier Inc.; 2012. Available from: http://www.ncbi.nlm.nih.gov/pubmed/22609849. Accessed 19 Dec 2015. Ohyanagi H, Ebata T, Huang X, Gong H, Fujita M, Mochizuki T, et al. OryzaGenome : Genome Diversity Database of Wild Oryza Species Special Online Collection – Database Paper. 2016;0(November 2015):1–7 Alexandrov N, Tai S, Wang W, Mansueto L, Palis K, Fuentes RR, et al. SNP-Seek database of SNPs derived from 3000 rice genomes. Nucleic Acids Res. 2015;63(2):2–6. Miller C, Qiao Y, DiSera T, D’Astous B, Marth G. Bam. Iobio: a Web-based, real-time, sequence alignment file inspector. Nat Methods. 2014;11(12):1189. Di Sera TL. vcf.iobio—A visually driven variant data inspector and real-time analysis web application. NEXT GEN SEEK. 2015. Available from: http://vcf.iobio.io/. Accessed 19 Dec 2015. Skinner ME, Uzilov AV, Stein LD, Mungall CJ, Holmes IH. JBrowse: a next-generation genome browser. Genome Res. 2009;19:1630–8. Available from: http://genome.cshlp.org/content/19/9/1630.short. MongoDB Inc. MongoDB. 2015. Available from: https://www.mongodb.org/ VCF 4.2 specification. 2015. Available from: https://samtools.github.io/hts-specs/VCFv4.2.pdf Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of drosophila melanogaster strain w 1118; iso-2; iso-3. Fly (Austin). 2012;6(June):80–92. Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, et al. Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 2005;15(10):1451–5. Available from: http://www.ncbi.nlm.nih.gov/pubmed/16169926. Thorvaldsdóttir H, Robinson JT, Mesirov JP. Integrative genomics viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 2013;14(2):178–92. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3603213&tool=pmcentrez&rendertype=abstract. Pivotal Software Inc. Java Spring Framework. 2015. Available from: http://projects.spring.io/spring-framework/ The jQuery Foundation. JQuery. 2015. Available from: https://jquery.com/ The Broad Institute. SamTools API. Available from: https://samtools.github.io/htsjdk/ Highsoft. Highcharts API. Available from: http://www.highcharts.com/products/highcharts. Accessed 19 Dec 2015. IRRI. 3,000 Rice genomes datasets. 2015. Available from: http://oryzasnp-atcg-irri-org.s3-website-ap-southeast-1.amazonaws.com/. Accessed 19 Dec 2015. Oracle. MySQL. 2015. Available from: http://dev.mysql.com/ Docker. 2015. Available from: https://www.docker.com/ Platform as a Service. Available from: https://en.wikipedia.org/wiki/Paas South Green Bioinformatic Platform. Gigwa code repository. 2015. Available from: https://github.com/SouthGreenPlatform/gigwa Sempere, G; Philippe, F; Dereeper, A; Ruiz, M; Sarah, G; Larmande, P. Supporting information for “Gigwa - Genotype Investigator for Genome Wide Analyses”. GigaScience Database. 2016. http://dx.doi.org/10.5524/100199

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA