SVAw - a web-based application tool for automated surrogate variable analysis of gene expression studies

Mehdi Pirooznia1, Fayaz Seifuddin1, Fernando S. Goes1, Jeffrey T. Leek2, Peter P. Zandi1
1Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, USA
2Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA

Tóm tắt

Abstract Background Surrogate variable analysis (SVA) is a powerful method to identify, estimate, and utilize the components of gene expression heterogeneity due to unknown and/or unmeasured technical, genetic, environmental, or demographic factors. These sources of heterogeneity are common in gene expression studies, and failing to incorporate them into the analysis can obscure results. Using SVA increases the biological accuracy and reproducibility of gene expression studies by identifying these sources of heterogeneity and correctly accounting for them in the analysis. Results Here we have developed a web application called SVAw (Surrogate variable analysis Web app) that provides a user friendly interface for SVA analyses of genome-wide expression studies. The software has been developed based on open source bioconductor SVA package. In our software, we have extended the SVA program functionality in three aspects: (i) the SVAw performs a fully automated and user friendly analysis workflow; (ii) It calculates probe/gene Statistics for both pre and post SVA analysis and provides a table of results for the regression of gene expression on the primary variable of interest before and after correcting for surrogate variables; and (iii) it generates a comprehensive report file, including graphical comparison of the outcome for the user. Conclusions SVAw is a web server freely accessible solution for the surrogate variant analysis of high-throughput datasets and facilitates removing all unwanted and unknown sources of variation. It is freely available for use at http://psychiatry.igm.jhmi.edu/sva. The executable packages for both web and standalone application and the instruction for installation can be downloaded from our web site.

Từ khóa


Tài liệu tham khảo

Gagnon-Bartsch JA, Speed TP: Using control genes to correct for unwanted variation in microarray data. Biostatistics. 2012, 13 (3): 539-552. 10.1093/biostatistics/kxr034.

Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, Geman D, Baggerly K, Irizarry RA: Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet. 2010, 11 (10): 733-739. 10.1038/nrg2825.

Teschendorff AE, Zhuang J, Widschwendter M: Independent surrogate variable analysis to deconvolve confounding factors in large-scale microarray profiling studies. Bioinformatics. 2011, 27 (11): 1496-1505. 10.1093/bioinformatics/btr171.

Leek JT, Storey JD: Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 2007, 3 (9): 1724-1735.

Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD: The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012, 28 (6): 882-883. 10.1093/bioinformatics/bts034.

Pirooznia M, Seifuddin F, Judy J, Goes FS, Zandi PP: Metamoodics: An Integrated Web Resource For Systematic Meta-Analysis Of Genetic Association Studies In Mood Disorders.http://metamoodics.org.

Leek JT, Storey JD: A general framework for multiple testing dependence. Proc Natl Acad Sci USA. 2008, 105 (48): 18718-18723. 10.1073/pnas.0808709105.

Storey JD, Akey JM, Kruglyak L: Multiple locus linkage analysis of genomewide expression in yeast. PLoS Biol. 2005, 3 (8): e267-10.1371/journal.pbio.0030267.

Gibson G: The environmental contribution to gene expression profiles. Nat Rev Genet. 2008, 9 (8): 575-581. 10.1038/nrg2383.

Draper NR, Smith H: Applied Regression Analysis. 1998, Hoboken, NJ: Wiley-Interscience

Apache Tomcat.http://tomcat.apache.org/.

Java programming language.http://www.oracle.com/technetwork/java/index.html.

jQuery JavaScript Library.http://jquery.com/.

JavaServer Pages Technology.http://www.oracle.com/technetwork/java/javaee/jsp/.

Java Servlet Technology.http://www.oracle.com/technetwork/java/index-jsp-135475.html.

Visne I, Dilaveroglu E, Vierlinger K, Lauss M, Yildiz A, Weinhaeusel A, Noehammer C, Leisch F, Kriegner A: RGG: a general GUI Framework for R scripts. BMC Bioinforma. 2009, 10: 74-10.1186/1471-2105-10-74.

Newton R, Deonarine A, Wernisch L: Creating web applications for spatial epidemiological analysis and mapping in R using Rwui. Source code for biology and medicine. 2011, 6 (1): 6-10.1186/1751-0473-6-6.

The R Project for Statistical Computing.http://www.r-project.org/.

The Bioconductor project.http://www.bioconductor.org/.

The Apache HTTP Server Project.http://httpd.apache.org/.

Red Hat® Enterprise Linux® servers.http://www.redhat.com/products/enterprise-linux/.

WAR file format. http://en.wikipedia.org/wiki/WAR_file_format_(Sun)

corpcor R package.http://strimmerlab.org/software/corpcor/.

qvalue: Q-value estimation for false discovery rate control.http://www.bioconductor.org/packages/release/bioc/html/qvalue.html.