thumbnail

Springer Science and Business Media LLC

  1471-2164

 

 

Cơ quản chủ quản:  BioMed Central Ltd. , BMC

Lĩnh vực:
GeneticsBiotechnology

Các bài báo tiêu biểu

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation
Tập 21 Số 1 - 2020
Davide Chicco, Giuseppe Jurman
AbstractBackgroundTo evaluate binary classifications and their confusion matrices, scientific researchers can employ several statistical rates, accordingly to the goal of the experiment they are investigating. Despite being a crucial issue in machine learning, no widespread consensus has been reached on a unified elective chosen measure yet. Accuracy and F1score computed on confusion matrices have been (and still are) among the most popular adopted metrics in binary classification tasks. However, these statistical measures can dangerously show overoptimistic inflated results, especially on imbalanced datasets.ResultsThe Matthews correlation coefficient (MCC), instead, is a more reliable statistical rate which produces a high score only if the prediction obtained good results in all of the four confusion matrix categories (true positives, false negatives, true negatives, and false positives), proportionally both to the size of positive elements and the size of negative elements in the dataset.ConclusionsIn this article, we show how MCC produces a more informative and truthful score in evaluating binary classifications than accuracy and F1score, by first explaining the mathematical properties, and then the asset of MCC in six synthetic use cases and in a real genomics scenario. We believe that the Matthews correlation coefficient should be preferred to accuracy and F1score in evaluating binary classification tasks by all scientific communities.
A data-driven approach to preprocessing Illumina 450K methylation array data
Tập 14 Số 1 - 2013
Ruth Pidsley, Chloe Wong, Manuela Volta, Katie Lunnon, Jonathan Mill, Leonard C. Schalkwyk
Abstract Background As the most stable and experimentally accessible epigenetic mark, DNA methylation is of great interest to the research community. The landscape of DNA methylation across tissues, through development and in disease pathogenesis is not yet well characterized. Thus there is a need for rapid and cost effective methods for assessing genome-wide levels of DNA methylation. The Illumina Infinium HumanMethylation450 (450K) BeadChip is a very useful addition to the available methods for DNA methylation analysis but its complex design, incorporating two different assay methods, requires careful consideration. Accordingly, several normalization schemes have been published. We have taken advantage of known DNA methylation patterns associated with genomic imprinting and X-chromosome inactivation (XCI), in addition to the performance of SNP genotyping assays present on the array, to derive three independent metrics which we use to test alternative schemes of correction and normalization. These metrics also have potential utility as quality scores for datasets. Results The standard index of DNA methylation at any specific CpG site is β = M/(M + U + 100) where M and U are methylated and unmethylated signal intensities, respectively. Betas (βs) calculated from raw signal intensities (the default GenomeStudio behavior) perform well, but using 11 methylomic datasets we demonstrate that quantile normalization methods produce marked improvement, even in highly consistent data, by all three metrics. The commonly used procedure of normalizing betas is inferior to the separate normalization of M and U, and it is also advantageous to normalize Type I and Type II assays separately. More elaborate manipulation of quantiles proves to be counterproductive. Conclusions Careful selection of preprocessing steps can minimize variance and thus improve statistical power, especially for the detection of the small absolute DNA methylation changes likely associated with complex disease phenotypes. For the convenience of the research community we have created a user-friendly R software package called wateRmelon, downloadable from bioConductor, compatible with the existing methylumi, minfi and IMA packages, that allows others to utilize the same normalization methods and data quality tests on 450K data.
Gene networks driving bovine milk fat synthesis during the lactation cycle
Tập 9 Số 1 - Trang 366 - 2008
Massimo Bionaz, Juan J. Loor
Development and implementation of high-throughput SNP genotyping in barley
Tập 10 Số 1 - Trang 582 - 2009
Timothy J. Close, Prasanna R. Bhat, Stefano Lonardi, Yan-Ling Wu, Nils Rostoks, Luke Ramsay, Arnis Druka, Nils Stein, Jan T. Svensson, Steve Wanamaker, Serdar Bozdag, Mikeal L. Roose, Matthew J. Moscou, Shiaoman Chao, Rajeev K. Varshney, Péter Szűcs, Kazuhiro Sato, Patrick Hayes, David E. Matthews, A. Kleinhofs, Gary J. Muehlbauer, Joseph DeYoung, David Marshall, Kavitha Madishetty, Raymond D. Fenton, Pascal Condamine, Andreas Graner, Robbie Waugh
Co-occurrence of resistance genes to antibiotics, biocides and metals reveals novel insights into their co-selection potential
- 2015
Chandan Pal, Johan Bengtsson‐Palme, Erik Kristiansson, D. G. Joakim Larsson
CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers
Tập 16 Số 1 - 2015
Rachid Ounit, Steve Wanamaker, Timothy J. Close, Stefano Lonardi
The unfoldomics decade: an update on intrinsically disordered proteins
- 2008
A. Keith Dunker, Christopher J. Oldfield, Jingwei Meng, Pedro Romero, Jack Yang, Jessica Walton Chen, Vladimir Vacic, Zoran Obradović, Vladimir N. Uversky
Comparative analysis of fungal genomes reveals different plant cell wall degrading capacity in fungi
Tập 14 Số 1 - 2013
Zhongtao Zhao, Huiquan Liu, Chenfang Wang, Jin‐Rong Xu
Abstract EDITOR'S NOTE Readers are alerted that there is currently a discussion regarding the use of some of the unpublished genomic data presented in this manuscript. Appropriate editorial action will be taken once this matter is resolved. Background Fungi produce a variety of carbohydrate activity enzymes (CAZymes) for the degradation of plant polysaccharide materials to facilitate infection and/or gain nutrition. Identifying and comparing CAZymes from fungi with different nutritional modes or infection mechanisms may provide information for better understanding of their life styles and infection models. To date, over hundreds of fungal genomes are publicly available. However, a systematic comparative analysis of fungal CAZymes across the entire fungal kingdom has not been reported. Results In this study, we systemically identified glycoside hydrolases (GHs), polysaccharide lyases (PLs), carbohydrate esterases (CEs), and glycosyltransferases (GTs) as well as carbohydrate-binding modules (CBMs) in the predicted proteomes of 103 representative fungi from Ascomycota, Basidiomycota, Chytridiomycota, and Zygomycota. Comparative analysis of these CAZymes that play major roles in plant polysaccharide degradation revealed that fungi exhibit tremendous diversity in the number and variety of CAZymes. Among them, some families of GHs and CEs are the most prevalent CAZymes that are distributed in all of the fungi analyzed. Importantly, cellulases of some GH families are present in fungi that are not known to have cellulose-degrading ability. In addition, our results also showed that in general, plant pathogenic fungi have the highest number of CAZymes. Biotrophic fungi tend to have fewer CAZymes than necrotrophic and hemibiotrophic fungi. Pathogens of dicots often contain more pectinases than fungi infecting monocots. Interestingly, besides yeasts, many saprophytic fungi that are highly active in degrading plant biomass contain fewer CAZymes than plant pathogenic fungi. Furthermore, analysis of the gene expression profile of the wheat scab fungus Fusarium graminearum revealed that most of the CAZyme genes related to cell wall degradation were up-regulated during plant infection. Phylogenetic analysis also revealed a complex history of lineage-specific expansions and attritions for the PL1 family. Conclusions Our study provides insights into the variety and expansion of fungal CAZyme classes and revealed the relationship of CAZyme size and diversity with their nutritional strategy and host specificity.
Genome-wide classification and expression analysis of MYB transcription factor families in rice and Arabidopsis
- 2012
Amit Katiyar, Shuchi Smita, Sangram K. Lenka, Ravi Rajwanshi, Viswanathan Chinnusamy, Kailash C. Bansal
Abstract Background The MYB gene family comprises one of the richest groups of transcription factors in plants. Plant MYB proteins are characterized by a highly conserved MYB DNA-binding domain. MYB proteins are classified into four major groups namely, 1R-MYB, 2R-MYB, 3R-MYB and 4R-MYB based on the number and position of MYB repeats. MYB transcription factors are involved in plant development, secondary metabolism, hormone signal transduction, disease resistance and abiotic stress tolerance. A comparative analysis of MYB family genes in rice and Arabidopsis will help reveal the evolution and function of MYB genes in plants. Results A genome-wide analysis identified at least 155 and 197 MYB genes in rice and Arabidopsis, respectively. Gene structure analysis revealed that MYB family genes possess relatively more number of introns in the middle as compared with C- and N-terminal regions of the predicted genes. Intronless MYB-genes are highly conserved both in rice and Arabidopsis. MYB genes encoding R2R3 repeat MYB proteins retained conserved gene structure with three exons and two introns, whereas genes encoding R1R2R3 repeat containing proteins consist of six exons and five introns. The splicing pattern is similar among R1R2R3 MYB genes in Arabidopsis. In contrast, variation in splicing pattern was observed among R1R2R3 MYB members of rice. Consensus motif analysis of 1kb upstream region (5′ to translation initiation codon) of MYB gene ORFs led to the identification of conserved and over-represented cis-motifs in both rice and Arabidopsis. Real-time quantitative RT-PCR analysis showed that several members of MYBs are up-regulated by various abiotic stresses both in rice and Arabidopsis. Conclusion A comprehensive genome-wide analysis of chromosomal distribution, tandem repeats and phylogenetic relationship of MYB family genes in rice and Arabidopsis suggested their evolution via duplication. Genome-wide comparative analysis of MYB genes and their expression analysis identified several MYBs with potential role in development and stress response of plants.
Transcriptomic and metabolite analyses of Cabernet Sauvignon grape berry development
Tập 8 Số 1 - 2007
Laurent Deluc, Jérôme Grimplet, Matthew D. Wheatley, Richard Tillett, David R. Quilici, Craig Osborne, David A. Schooley, Karen Schlauch, John C. Cushman, Grant R. Cramer
AbstractBackgroundGrape berry development is a dynamic process that involves a complex series of molecular genetic and biochemical changes divided into three major phases. During initial berry growth (Phase I), berry size increases along a sigmoidal growth curve due to cell division and subsequent cell expansion, and organic acids (mainly malate and tartrate), tannins, and hydroxycinnamates accumulate to peak levels. The second major phase (Phase II) is defined as a lag phase in which cell expansion ceases and sugars begin to accumulate. Véraison (the onset of ripening) marks the beginning of the third major phase (Phase III) in which berries undergo a second period of sigmoidal growth due to additional mesocarp cell expansion, accumulation of anthocyanin pigments for berry color, accumulation of volatile compounds for aroma, softening, peak accumulation of sugars (mainly glucose and fructose), and a decline in organic acid accumulation. In order to understand the transcriptional network responsible for controlling berry development, mRNA expression profiling was conducted on berries ofV. viniferaCabernet Sauvignon using the Affymetrix GeneChip®Vitisoligonucleotide microarray ver. 1.0 spanning seven stages of berry development from small pea size berries (E-L stages 31 to 33 as defined by the modified E-L system), through véraison (E-L stages 34 and 35), to mature berries (E-L stages 36 and 38). Selected metabolites were profiled in parallel with mRNA expression profiling to understand the effect of transcriptional regulatory processes on specific metabolite production that ultimately influence the organoleptic properties of wine.ResultsOver the course of berry development whole fruit tissues were found to express an average of 74.5% of probes represented on theVitismicroarray, which has 14,470 Unigenes. Approximately 60% of the expressed transcripts were differentially expressed between at least two out of the seven stages of berry development (28% of transcripts, 4,151 Unigenes, had pronounced (≥2 fold) differences in mRNA expression) illustrating the dynamic nature of the developmental process. The subset of 4,151 Unigenes was split into twenty well-correlated expression profiles. Expression profile patterns included those with declining or increasing mRNA expression over the course of berry development as well as transient peak or trough patterns across various developmental stages as defined by the modified E-L system. These detailed surveys revealed the expression patterns for genes that play key functional roles in phytohormone biosynthesis and response, calcium sequestration, transport and signaling, cell wall metabolism mediating expansion, ripening, and softening, flavonoid metabolism and transport, organic and amino acid metabolism, hexose sugar and triose phosphate metabolism and transport, starch metabolism, photosynthesis, circadian cycles and pathogen resistance. In particular, mRNA expression patterns of transcription factors, abscisic acid (ABA) biosynthesis, and calcium signaling genes identified candidate factors likely to participate in the progression of key developmental events such as véraison and potential candidate genes associated with such processes as auxin partitioning within berry cells, aroma compound production, and pathway regulation and sequestration of flavonoid compounds. Finally, analysis of sugar metabolism gene expression patterns indicated the existence of an alternative pathway for glucose and triose phosphate production that is invoked from véraison to mature berries.ConclusionThese results reveal the first high-resolution picture of the transcriptome dynamics that occur during seven stages of grape berry development. This work also establishes an extensive catalog of gene expression patterns for future investigations aimed at the dissection of the transcriptional regulatory hierarchies that govern berry development in a widely grown cultivar of wine grape. More importantly, this analysis identified a set of previously unknown genes potentially involved in critical steps associated with fruit development that can now be subjected to functional testing.