Base-Calling of Automated Sequencer Traces UsingPhred. I. Accuracy Assessment

Genome Research - Tập 8 Số 3 - Trang 175-185 - 1998
Brent Ewing1,2, LaDeana Hillier1,2, Michael C. Wendl1,2, Phil Green1,2
11Department of Molecular Biotechnology, University of Washington, Seattle, Washington 98195-7730 USA
22Genome Sequencing Center, Washington University School of Medicine, Saint Louis, Missouri 63108 USA

Tóm tắt

The availability of massive amounts of DNA sequence information has begun to revolutionize the practice of biology. As a result, current large-scale sequencing output, while impressive, is not adequate to keep pace with growing demand and, in particular, is far short of what will be required to obtain the 3-billion-base human genome sequence by the target date of 2005. To reach this goal, improved automation will be essential, and it is particularly important that human involvement in sequence data processing be significantly reduced or eliminated. Progress in this respect will require both improved accuracy of the data processing software and reliable accuracy measures to reduce the need for human involvement in error correction and make human review more efficient. Here, we describe one step toward that goal: a base-calling program for automated sequencer traces,phred,with improved accuracy.phredappears to be the first base-calling program to achieve a lower error rate than the ABI software, averaging 40%–50% fewer errors in the data sets examined independent of position in read, machine running conditions, or sequencing chemistry.

Từ khóa


Tài liệu tham khảo

ABI (1996) ABI PRISM, DNA sequencing analysis software, user’s manual. (PE Applied Biosystems, Foster City, CA).

10.1101/gr.6.2.80

Connell, 1987, Automated DNA sequence analysis., BioTechniques, 5, 342

Dear, 1992, A standard file format for data from DNA sequencing instruments., DNA Sequence, 3, 107, 10.3109/10425179209034003

Ewing, B. and P. Green. 1998. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. (this issue).

10.1093/nar/21.19.4530

Golden, J., E. Garcia, and C. Tibbetts. 1995. Evolutionary optimization of a neural network-based signal processor for photometric data from an automated DNA sequencer. In Evolutionary programming IV. Proceedings of the Fourth Annual Conference on Evolutionary Programming. pp. 579–601.

Golden J.B. Torgersen D. Tibbetts C. (1993) Pattern recognition for automated DNA sequencing: I. On-line signal conditioning and feature extraction for basecalling. in Proceedings of the First International Conference on Intelligent Systems for Molecular Biology, eds Hunter L. Searls D. Shavlik J. (AAAI Press, Menlo Park, CA), pp 136–144.

10.1093/nar/20.10.2471

Parker, 1996, AmpliTaq DNA polymerase, FS dye-terminator sequencing: Analysis of peak height patterns., BioTechniques, 21, 694, 10.2144/96214rr02

Press W.H. Flannery B.P. Teukolsky S.A. Vetterling W.T. (1988) Numerical recipes in C. The art of scientific computing. (Cambridge University Press, Cambridge, UK).

10.1126/science.2443975

10.1016/0022-2836(75)90213-2

10.1073/pnas.74.12.5463

10.1038/321674a0

10.1016/0022-2836(81)90087-5

10.1038/356037a0