Bioinformatic curation and alignment of genotyped hepatitis B virus (HBV) sequence data from the GenBank public database
Tóm tắt
Hepatitis B virus (HBV) DNA sequence data from thousands of samples are present in the public sequence databases. No publicly available, up-to-date, multiple sequence alignments, containing full-length and subgenomic fragments per genotype, are available. Such alignments are useful in many analysis applications, including data-mining and phylogenetic analyses. By issuing a query, all HBV sequence data from the GenBank public database was downloaded (67,893 sequences). Full-length and subgenomic sequences, which were genotyped by the submitters (30,852 sequences), were placed into a multiple sequence alignment, for each genotype (genotype A: 5868 sequences, B: 4630, C: 7820, D: 8300, E: 2043, F: 985, G: 189, H: 108, I: 23), according to the results of offline BLAST searches against a custom reference library of full-length sequences. Further curation was performed to improve the alignment. The algorithm described in this paper generates, for each of the nine HBV genotypes, multiple sequence alignments, which contain full-length and subgenomic fragments. The alignments can be updated as new sequences become available in the online public sequence databases. The alignments are available at
http://hvdr.bioinf.wits.ac.za/alignments
.
Tài liệu tham khảo
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410
Arankalle VA, Gandhe SS, Borkakoty BJ, Walimbe AM, Biswas D, Mahanta J (2010) A novel HBV recombinant (genotype I) similar to Vietnam/Laos in a primitive tribe in eastern India. J Viral Hepat 17:501–510
Beerenwinkel N, Däumer M, Oette M, Korn K, Hoffmann D, Kaiser R, Lengauer T, Selbig J, Walter H (2003) Geno2pheno: estimating phenotypic drug resistance from HIV-1 genotypes. Nucl Acids Res 31(13):3850–3855. doi:10.1093/nar/gkg575
Bell TG, Kramvis A (2013) Mutation reporter tool: an online tool to interrogate loci of interest, with its utility demonstrated using hepatitis B virus. Virol J. doi:10.1186/1743-422X-10-62
Bell TG, Kramvis A (2015) Bioinformatics tools for small genomes, such as hepatitis B virus. Viruses 7(2):781–797
Bell TG, Kramvis A (2016) The study of hepatitis B virus using bioinformatics. In: Abdurakhmonov I (ed) btitleBioinformatics—updated features and applications, 1st edn. InTechOpen, Rijeka. http://bit.ly/BioinformaticsChapterHBV
Benson DA, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2015) GenBank. Nucl Acids Res 43(Database issue):30–35
Bilofsky HS, Burks C, Fickett JW, Goad WB, Lewitter FI, Rindone WP, Swindell CD, Tung CS (1986) The GenBank genetic sequence databank. Nucl Acids Res 14(1):1–4
Cai Q, Zhu H, Zhang Y, Li X, Zhang Z(2016) Hepatitis B virus genotype A: design of reference sequences for sub-genotypes. Virus Genes. doi:10.1007/s11262-016-1307-0
Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, de Hoon MJ (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11):1422–1423
de Oliveira T, Deforche K, Cassol S, Salminen M, Paraskevis D, Seebregts C, Snoeck J, van Rensburg EJ, Wensing AM, van de Vijver DA, Boucher CA, Camacho R, Vandamme AM (2005) An automated genotyping system for analysis of HIV-1 and other microbial sequences. Bioinformatics 21(19):3797–3800
Hannoun C, Norder H, Lindh M (2000) An aberrant genotype revealed in recombinant hepatitis B virus strains from Vietnam. J Gen Virol 81(Pt 9):2267–2272
Hayer J, Jadeau F, Deleage G, Kay A, Zoulim F, Combet C (2013) HBVdb: a knowledge database for hepatitis B virus. Nucl Acids Res 41(Database issue):566–570
Kanehisa M, Fickett JW, Goad WB (1984) A relational database system for the maintenance and verification of the Los Alamos sequence library. Nucl Acids Res 12(1 Pt 1):149–158
Karsch-Mizrachi I, Nakamura Y, Cochrane G, Miyano S, Nakamura H, Sugano S, Danchin A, Savakis B, Weissenbach J, Weng Z, Salzberg S (2012) The international nucleotide sequence database collaboration. Nucl Acids Res 40(Database issue):33–37
Kramvis A (2014) Genotypes and genetic variability of hepatitis B virus. Intervirology 57(3–4):141–150
Kramvis A, Kew M, François G (2005) Hepatitis B virus genotypes. Vaccine 23:2409–2423
Kramvis A, Arakawa K, Yu MC, Nogueira R, Stram DO, Kew MC (2008) Relationship of serological subtype, basic core promoter and precore mutations to genotypes/subgenotypes of hepatitis B virus. J Med Virol 80:27–46
Kurbanov F, Tanaka Y, Kramvis A, Simmonds P, Mizokami M (2008) When should “I” consider a new hepatitis B virus genotype? J Virol 82(16):8241–8242
Larsson A (2014) Aliview: a fast and lightweight alignment viewer and editor for large datasets. Bioinformatics 30(22):3276–3278. doi:10.1093/bioinformatics/btu531. http://bioinformatics.oxfordjournals.org/content/30/22/3276.full.pdf+html
Libin P, Deforche K, Laethem K, Camacho R, Vandamme A(2007) Regadb: an open source, community-driven hiv data and analysis management environment. In: btitleFifth European HIV Drug Resistance Workshop, Cascais, Portugal
Michel M-L, Tiollais P (1987) Structure and expression of the hepatitis b virus genome. Hepatology 7(S1):61–63. doi:10.1002/hep.1840070711
Myers R, Clark C, Khan A, Kellam P, Tedder R (2006) Genotyping hepatitis B virus from whole- and sub-genomic fragments using position-specific scoring matrices in HBV STAR. J Gen Virol 87(Pt 6):1459–1464
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453
Nicholas KB, Nicholas HB, Deerfield DW (1997) GeneDoc: analysis and visualization of genetic variation. Embnew News 4:14
Norder H, Courouce AM, Coursaget P, Echevarria JM, Lee SD, Mushahwar IK, Robertson BH, Locarnini S, Magnius LO (2004) Genetic diversity of hepatitis B virus strains derived worldwide: genotypes, subgenotypes, and HBsAg subtypes. Intervirology 47(6):289–309
Osiowy C, Kaita K, Solar K, Mendoza K (2010) Molecular characterization of hepatitis B virus and a 9-year clinical profile in a patient infected with genotype I. J Med Virol 82:942–948
Panjaworayan N, Roessner SK, Firth AE, Brown CM (2007) HBVRegDB: annotation, comparison, detection and visualization of regulatory elements in hepatitis B virus sequences. Virol J 4:136
Rice P, Longden I, Bleasby A (2000) EMBOSS: the European Molecular Biology Open Software Suite (2000). Trends Genet 16:276–277
Rozanov M, Plikat U, Chappey C, Kochergin A, Tatusova T (2004) A web-based genotyping resource for viral sequences. Nucl Acids Res 32(suppl 2):654–659. doi:10.1093/nar/gkh419
Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA 74(12):5463–5467
Steinhauer DA, Holland JJ (1986) Direct method for quantitation of extreme polymerase error frequencies at selected single base sites in viral RNA. J Virol 57:219–228. http://jvi.asm.org/content/57/1/219.full.pdf+html
Summers J, Smolec JM, Snyder R (1978) A virus similar to human hepatitis B virus associated with hepatitis and hepatoma in woodchucks. Proc Natl Acad Sci USA 75:4533–4537
Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S (2011) MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28(10):2731–2739
Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013) MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol Biol Evol 30(12):2725–2729
Tran TTH, Trinh TN, Abe K (2008) New complex recombinant genotype of hepatitis B virus identified in Vietnam. J Virol 82:5657–5663
van Rossum G (1995) Python tutorial, Technical Report CS-R9536. Centrum voor Wiskunde en Informatica (CWI), Amsterdam
Yu H, Yuan Q, Ge S-X, Wang H-Y, Zhang Y-L, Chen Q-R, Zhang J, Chen P-J, Xia N-S (2010) Molecular and phylogenetic analyses suggest an additional hepatitis B virus genotype “I”. PLOS One 5:9297
Yuen LK, Ayres A, Littlejohn M, Colledge D, Edgely A, Maskill WJ, Locarnini SA, Bartholomeusz A (2007) SeqHepB: a sequence analysis program and relational database system for chronic hepatitis B. Antivir Res 75(1):64–74
Zhu HL, Wang CT, Xia JB, Li X, Zhang ZH (2015) Establishment of reference sequences of hepatitis B virus genotype C subgenotypes. Genet Mol Res 14(4):16521–16534