Bioinformatic curation and alignment of genotyped hepatitis B virus (HBV) sequence data from the GenBank public database

SpringerPlus - Tập 5 - Trang 1-13 - 2016
Trevor G. Bell1, Mukhlid Yousif1, Anna Kramvis1
1Hepatitis Virus Diversity Research Unit, Department of Internal Medicine, University of the Witwatersrand, Parktown, South Africa

Tóm tắt

Hepatitis B virus (HBV) DNA sequence data from thousands of samples are present in the public sequence databases. No publicly available, up-to-date, multiple sequence alignments, containing full-length and subgenomic fragments per genotype, are available. Such alignments are useful in many analysis applications, including data-mining and phylogenetic analyses. By issuing a query, all HBV sequence data from the GenBank public database was downloaded (67,893 sequences). Full-length and subgenomic sequences, which were genotyped by the submitters (30,852 sequences), were placed into a multiple sequence alignment, for each genotype (genotype A: 5868 sequences, B: 4630, C: 7820, D: 8300, E: 2043, F: 985, G: 189, H: 108, I: 23), according to the results of offline BLAST searches against a custom reference library of full-length sequences. Further curation was performed to improve the alignment. The algorithm described in this paper generates, for each of the nine HBV genotypes, multiple sequence alignments, which contain full-length and subgenomic fragments. The alignments can be updated as new sequences become available in the online public sequence databases. The alignments are available at http://hvdr.bioinf.wits.ac.za/alignments .

Tài liệu tham khảo

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410 Arankalle VA, Gandhe SS, Borkakoty BJ, Walimbe AM, Biswas D, Mahanta J (2010) A novel HBV recombinant (genotype I) similar to Vietnam/Laos in a primitive tribe in eastern India. J Viral Hepat 17:501–510 Beerenwinkel N, Däumer M, Oette M, Korn K, Hoffmann D, Kaiser R, Lengauer T, Selbig J, Walter H (2003) Geno2pheno: estimating phenotypic drug resistance from HIV-1 genotypes. Nucl Acids Res 31(13):3850–3855. doi:10.1093/nar/gkg575 Bell TG, Kramvis A (2013) Mutation reporter tool: an online tool to interrogate loci of interest, with its utility demonstrated using hepatitis B virus. Virol J. doi:10.1186/1743-422X-10-62 Bell TG, Kramvis A (2015) Bioinformatics tools for small genomes, such as hepatitis B virus. Viruses 7(2):781–797 Bell TG, Kramvis A (2016) The study of hepatitis B virus using bioinformatics. In: Abdurakhmonov I (ed) btitleBioinformatics—updated features and applications, 1st edn. InTechOpen, Rijeka. http://bit.ly/BioinformaticsChapterHBV Benson DA, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2015) GenBank. Nucl Acids Res 43(Database issue):30–35 Bilofsky HS, Burks C, Fickett JW, Goad WB, Lewitter FI, Rindone WP, Swindell CD, Tung CS (1986) The GenBank genetic sequence databank. Nucl Acids Res 14(1):1–4 Cai Q, Zhu H, Zhang Y, Li X, Zhang Z(2016) Hepatitis B virus genotype A: design of reference sequences for sub-genotypes. Virus Genes. doi:10.1007/s11262-016-1307-0 Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, de Hoon MJ (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11):1422–1423 de Oliveira T, Deforche K, Cassol S, Salminen M, Paraskevis D, Seebregts C, Snoeck J, van Rensburg EJ, Wensing AM, van de Vijver DA, Boucher CA, Camacho R, Vandamme AM (2005) An automated genotyping system for analysis of HIV-1 and other microbial sequences. Bioinformatics 21(19):3797–3800 Hannoun C, Norder H, Lindh M (2000) An aberrant genotype revealed in recombinant hepatitis B virus strains from Vietnam. J Gen Virol 81(Pt 9):2267–2272 Hayer J, Jadeau F, Deleage G, Kay A, Zoulim F, Combet C (2013) HBVdb: a knowledge database for hepatitis B virus. Nucl Acids Res 41(Database issue):566–570 Kanehisa M, Fickett JW, Goad WB (1984) A relational database system for the maintenance and verification of the Los Alamos sequence library. Nucl Acids Res 12(1 Pt 1):149–158 Karsch-Mizrachi I, Nakamura Y, Cochrane G, Miyano S, Nakamura H, Sugano S, Danchin A, Savakis B, Weissenbach J, Weng Z, Salzberg S (2012) The international nucleotide sequence database collaboration. Nucl Acids Res 40(Database issue):33–37 Kramvis A (2014) Genotypes and genetic variability of hepatitis B virus. Intervirology 57(3–4):141–150 Kramvis A, Kew M, François G (2005) Hepatitis B virus genotypes. Vaccine 23:2409–2423 Kramvis A, Arakawa K, Yu MC, Nogueira R, Stram DO, Kew MC (2008) Relationship of serological subtype, basic core promoter and precore mutations to genotypes/subgenotypes of hepatitis B virus. J Med Virol 80:27–46 Kurbanov F, Tanaka Y, Kramvis A, Simmonds P, Mizokami M (2008) When should “I” consider a new hepatitis B virus genotype? J Virol 82(16):8241–8242 Larsson A (2014) Aliview: a fast and lightweight alignment viewer and editor for large datasets. Bioinformatics 30(22):3276–3278. doi:10.1093/bioinformatics/btu531. http://bioinformatics.oxfordjournals.org/content/30/22/3276.full.pdf+html Libin P, Deforche K, Laethem K, Camacho R, Vandamme A(2007) Regadb: an open source, community-driven hiv data and analysis management environment. In: btitleFifth European HIV Drug Resistance Workshop, Cascais, Portugal Michel M-L, Tiollais P (1987) Structure and expression of the hepatitis b virus genome. Hepatology 7(S1):61–63. doi:10.1002/hep.1840070711 Myers R, Clark C, Khan A, Kellam P, Tedder R (2006) Genotyping hepatitis B virus from whole- and sub-genomic fragments using position-specific scoring matrices in HBV STAR. J Gen Virol 87(Pt 6):1459–1464 Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453 Nicholas KB, Nicholas HB, Deerfield DW (1997) GeneDoc: analysis and visualization of genetic variation. Embnew News 4:14 Norder H, Courouce AM, Coursaget P, Echevarria JM, Lee SD, Mushahwar IK, Robertson BH, Locarnini S, Magnius LO (2004) Genetic diversity of hepatitis B virus strains derived worldwide: genotypes, subgenotypes, and HBsAg subtypes. Intervirology 47(6):289–309 Osiowy C, Kaita K, Solar K, Mendoza K (2010) Molecular characterization of hepatitis B virus and a 9-year clinical profile in a patient infected with genotype I. J Med Virol 82:942–948 Panjaworayan N, Roessner SK, Firth AE, Brown CM (2007) HBVRegDB: annotation, comparison, detection and visualization of regulatory elements in hepatitis B virus sequences. Virol J 4:136 Rice P, Longden I, Bleasby A (2000) EMBOSS: the European Molecular Biology Open Software Suite (2000). Trends Genet 16:276–277 Rozanov M, Plikat U, Chappey C, Kochergin A, Tatusova T (2004) A web-based genotyping resource for viral sequences. Nucl Acids Res 32(suppl 2):654–659. doi:10.1093/nar/gkh419 Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA 74(12):5463–5467 Steinhauer DA, Holland JJ (1986) Direct method for quantitation of extreme polymerase error frequencies at selected single base sites in viral RNA. J Virol 57:219–228. http://jvi.asm.org/content/57/1/219.full.pdf+html Summers J, Smolec JM, Snyder R (1978) A virus similar to human hepatitis B virus associated with hepatitis and hepatoma in woodchucks. Proc Natl Acad Sci USA 75:4533–4537 Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S (2011) MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28(10):2731–2739 Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013) MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol Biol Evol 30(12):2725–2729 Tran TTH, Trinh TN, Abe K (2008) New complex recombinant genotype of hepatitis B virus identified in Vietnam. J Virol 82:5657–5663 van Rossum G (1995) Python tutorial, Technical Report CS-R9536. Centrum voor Wiskunde en Informatica (CWI), Amsterdam Yu H, Yuan Q, Ge S-X, Wang H-Y, Zhang Y-L, Chen Q-R, Zhang J, Chen P-J, Xia N-S (2010) Molecular and phylogenetic analyses suggest an additional hepatitis B virus genotype “I”. PLOS One 5:9297 Yuen LK, Ayres A, Littlejohn M, Colledge D, Edgely A, Maskill WJ, Locarnini SA, Bartholomeusz A (2007) SeqHepB: a sequence analysis program and relational database system for chronic hepatitis B. Antivir Res 75(1):64–74 Zhu HL, Wang CT, Xia JB, Li X, Zhang ZH (2015) Establishment of reference sequences of hepatitis B virus genotype C subgenotypes. Genet Mol Res 14(4):16521–16534