Accurate estimation of the signal baseline in DNA chromatograms

L. Andrade1, E.S. Manolakos1
1Communications and Digital Signal Processing (CDSP), Center for Research and Graduate Studies Electrical and Computer Engineering Department, Northeastern University, Boston, MA, USA

Tóm tắt

Estimating accurately the varying baseline level in different parts of a DNA chromatogram is a challenging and important problem for accurate base-calling. We are formulating the problem in a statistical learning framework and propose an Expectation-Maximization algorithm for its solution. In addition we also present a faster, iterative histogram based method for estimating the background of the signal in small size windows. The two methods can be combined with regression techniques to correct the baseline in all regions of the chromatogram and are shown to work well even in areas of low SNR. By improving the separation of clusters, baseline correction actions reduce the classification errors when using the BEM base-caller developed in our group.

Từ khóa

#Statistical learning #Bayesian methods #Signal resolution #Signal to noise ratio #Data mining #Digital signal processing #DNA computing #Expectation-maximization algorithms #Iterative methods #Histograms

Tài liệu tham khảo

giddings, 1998, A software system for data analysis in automated DNA sequencing, Genome Research, 8, 644, 10.1101/gr.8.6.644 10.1109/10.867962 10.1101/gr.6.2.80 10.1109/79.543975 andrade, 0, Skyline Normalization of DNA Chromatograms by Regression, GENSIPS 2002 micklos, 1992, Primer on Molecular Genetics brown, 1994, DNA Sequencing The Basics 1996, Perkin-Elmer, Applied Biosystems, ABI Prism DNA sequencing analysis software User's Manual forrester, 0, Interpreting DNA sequencing results golden, 1993, Pattern recognition for automated DNA sequencing: I. On-line signal conditioning and feature extraction for base-calling, First international conference on intelligent systems for molecular biology, 136 ewing, 1998, Base-calling of automated sequencer traces using phred. I, Accuracy assessment Gen Res, 8, 175 alphey, 1997, DNA sequencing From experimental methods to bioinfonnatics 10.1016/S0166-218X(00)00192-X 10.1002/elps.1150170626