A probabilistic approach for long read-length DNA sequence analysis

C.G. Molina1, J. Mullikin1
1Informaties Division, The Sanger Centre, Cambridge, UK

Tóm tắt

This paper introduces a new algorithm for DNA sequence analysis, based on the use of a reference DNA sequence for the estimation of base positions, and a probabilistic modelling of trace peaks. The new algorithm has been applied to long read-length DNA sequences and its performance has been compared to the base-calling program Phred. The results reported in this paper, after cross-matching with a finished consensus, show a significant improvement by the new algorithm in the final sequence read-length and in the number of correct bases extracted from DNA traces.

Từ khóa

#DNA #Bioinformatics #Genomics #Signal processing algorithms #Phase estimation #Image sequence analysis #Signal analysis #Libraries #Algorithm design and analysis #Humans

Tài liệu tham khảo

brown, 1999, Genomes, 37 bevingto, 1969, Data Reduction and Error Analysis for the Physical Sciences stow, 1997, experimental issues of functional merging on probability density estimation, Fifth International Conference on Artificial Neural Networks (Conf Publ No 440), 123, 10.1049/cp:19970713 10.1109/6.880952 giddings, 1998, A software system for data analysis in automated DNA sequencing, Genome Research, 8, 644, 10.1101/gr.8.6.644 10.1101/gr.6.2.80 haan, 2000, Modelling electropherogram data for DNA sequencing using MCMC, Proceedings IEEE International Conference on Acoustics Speech and Signal Processing 10.1101/gr.8.3.175 dempster, 1977, Maximum Likelihood from Incomplete Data Via the EM Algorithm, J Royal Statist Soc, b, 1 10.1101/gr.8.3.186 10.1093/nar/21.19.4530 10.1073/pnas.74.12.5463 10.1023/A:1008199518065