Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins

Protein Science - Tập 7 Số 2 - Trang 445-456 - 1998
Mark Gerstein1, Michael Levitt2
1Molecular Biophysics & Biochemistry Department, Yale University, New Haven, Connecticut 06520-8114, USA.
2Structural Biology Department, Stanford University, Stanford, California 94305

Tóm tắt

Abstract

We apply a simple method for aligning protein sequences on the basis of a 3D structure, on a large scale, to the proteins in the scop classification of fold families. This allows us to assess, understand, and improve our automatic method against an objective, manually derived standard, a type of comprehensive evaluation that has not yet been possible for other structural alignment algorithms. Our basic approach directly matches the backbones of two structures, using repeated cycles of dynamic programming and least‐squares fitting to determine an alignment minimizing coordinate difference. Because of simplicity, our method can be readily modified to take into account additional features of protein structure such as the orientation of side chains or the location‐dependent cost of opening a gap. Our basic method, augmented by such modifications, can find reasonable alignments for all but 1.5% of the known structural similarities in scop, i.e., all but 32 of the 2,107 superfamily pairs. We discuss the specific protein structural features that make these 32 pairs so difficult to align and show how our procedure effectively partitions the relationships in scop into different categories, depending on what aspects of protein structure are involved (e.g., depending on whether or not consideration of side‐chain orientation is necessary for proper alignment). We also show how our pairwise alignment procedure can be extended to generate a multiple alignment for a group of related structures. We have compared these alignments in detail with corresponding manual ones culled from the literature. We find good agreement (to within 95% for the core regions), and detailed comparison highlights how particular protein structural features (such as certain strands) are problematical to align, giving somewhat ambiguous results. With these improvements and systematic tests, our procedure should be useful for the development of scop and the future classification of protein folds. Supplementary material is available at http://bioinfo.mbb.yale.edu/align.

Từ khóa


Tài liệu tham khảo

10.1038/ng0294-119

10.1021/bi00589a025

10.1177/016555158901500411

10.1038/40310

10.1016/S0022-2836(77)80200-3

BrennerS ChothiaC HubbardT.1997. Assessing sequence comparison methods.Proc Natl Acad Sci USA. Forthcoming.

10.1016/S0076-6879(96)66039-X

10.1038/378140a0

10.1038/40313

10.1038/385579a0

10.1016/0022-2836(82)90178-4

10.1016/0022-2836(87)90412-8

CohenGH.1997. ALIGN: A program to superimpose protein coordinates accounting for insertions and deletions.J Appl Crystallogr. Forthcoming.

Doolittle RF, 1987, Of Urfs and Orfs

10.1006/jmbi.1996.0294

10.1016/S1359-0278(96)00021-1

10.1002/prot.340230412

10.1006/jmbi.1997.1412

10.1006/jmbi.1995.0423

Gerstein M, 1995, A structurally invariant core for the globins, CABIOS, 11, 633

Gerstein M, 1996, Proc Fourth Int Conf on Intell Sys Mol Biol, 59

GersteinM LevittM.1997. A structural census of the current population of protein sequences.Proc Natl Acad Sci USA. Forthcoming.

10.1006/jmbi.1993.1048

10.1016/0022-2836(94)90012-4

10.1016/S0959-440X(96)80058-3

10.1002/pro.5560050711

Godzik A, 1994, Flexible algorithm for direct multiple alignment of protein structures and sequences, CABIOS, 10, 587

10.1006/jmbi.1996.0679

10.1038/367532a0

Gribskov M, 1992, Sequence analysis primer

10.1006/jmbi.1993.1074

10.1017/CBO9780511574931

10.1006/jmbi.1994.1312

10.1002/pro.5560010313

10.1016/S0968-0004(96)80021-1

10.1006/jmbi.1993.1489

10.1016/0014-5793(93)81183-Z

Holm L, 1994, The FSSP database of structurally aligned protein fold families, Nucleic Acid Res, 22, 3600

10.1126/science.273.5275.595

10.1016/S0969-2126(97)00176-7

10.1093/nar/25.1.236

10.1126/science.7638617

10.1107/S0567739476001873

10.1002/bip.360221211

10.1002/pro.5560031105

10.1016/0092-8674(92)90085-Q

10.1016/0022-2836(71)90324-X

10.1016/0022-2836(82)90179-6

10.1016/0022-2836(80)90373-3

10.1093/protein/1.1.77

LevittM GersteinM.1998. A unified statistical framework for sequence comparison and structure comparison.Proc Natl Acad Sci USA. In press.

10.1016/S0022-2836(05)80134-2

10.1016/0022-2836(70)90057-4

10.1016/S0959-440X(94)90113-9

10.1038/372631a0

10.1002/pro.5560041003

Overington JP, 1993, Molecular recognition in protein families: A database of aligned three‐dimensional structures of related proteins, Biochem Soc Transact, 3, 597, 10.1042/bst0210597

10.1016/S0076-6879(96)66017-0

10.1073/pnas.85.8.2444

10.1038/185416a0

10.1016/0022-2836(80)90357-5

Rossmann MG, 1975, A comparison of the heme binding pocket in globins and cytochrome b5, J Biol Chem, 250, 7525, 10.1016/S0021-9258(19)40974-5

10.1016/S1874-6047(08)60210-3

10.1002/prot.340140216

10.1016/0022-2836(90)90134-8

10.1002/pro.5560030923

10.1016/S0959-440X(97)80027-9

10.1016/0022-2836(86)90245-7

10.1016/S0076-6879(96)66012-1

10.1093/protein/5.1.35

10.1002/(SICI)1097-0134(199707)28:3<405::AID-PROT10>3.0.CO;2-L

10.1016/0960-9822(93)90255-M

10.1002/pro.5560031025

10.1016/0022-2836(89)90084-3

10.1093/nar/22.22.4673

10.1016/S0022-2836(05)80006-3

10.1006/jmbi.1995.0340

10.1002/prot.340110107

10.1093/protein/5.1.43