Accurate estimation of the signal baseline in DNA chromatograms
Tóm tắt
Estimating accurately the varying baseline level in different parts of a DNA chromatogram is a challenging and important problem for accurate base-calling. We are formulating the problem in a statistical learning framework and propose an Expectation-Maximization algorithm for its solution. In addition we also present a faster, iterative histogram based method for estimating the background of the signal in small size windows. The two methods can be combined with regression techniques to correct the baseline in all regions of the chromatogram and are shown to work well even in areas of low SNR. By improving the separation of clusters, baseline correction actions reduce the classification errors when using the BEM base-caller developed in our group.
Từ khóa
#Statistical learning #Bayesian methods #Signal resolution #Signal to noise ratio #Data mining #Digital signal processing #DNA computing #Expectation-maximization algorithms #Iterative methods #HistogramsTài liệu tham khảo
giddings, 1998, A software system for data analysis in automated DNA sequencing, Genome Research, 8, 644, 10.1101/gr.8.6.644
10.1109/10.867962
10.1101/gr.6.2.80
10.1109/79.543975
andrade, 0, Skyline Normalization of DNA Chromatograms by Regression, GENSIPS 2002
micklos, 1992, Primer on Molecular Genetics
brown, 1994, DNA Sequencing The Basics
1996, Perkin-Elmer, Applied Biosystems, ABI Prism DNA sequencing analysis software User's Manual
forrester, 0, Interpreting DNA sequencing results
golden, 1993, Pattern recognition for automated DNA sequencing: I. On-line signal conditioning and feature extraction for base-calling, First international conference on intelligent systems for molecular biology, 136
ewing, 1998, Base-calling of automated sequencer traces using phred. I, Accuracy assessment Gen Res, 8, 175
alphey, 1997, DNA sequencing From experimental methods to bioinfonnatics
10.1016/S0166-218X(00)00192-X
10.1002/elps.1150170626