Knowledge-based analysis of microarray gene expression data by using support vector machines

Michael P. Brown1, William Noble Grundy1, David Lin1, Nello Cristianini1, Charles W. Sugnet1, Terrence S. Furey1, Manuel Ares1, David Haussler1
1Department of Computer Science and Center for Molecular Biology of RNA, Department of Biology, University of California, Santa Cruz, Santa Cruz, CA 95064; Department of Computer Science, Columbia University, New York, NY 10025; Department of Engineering Mathematics, University of Bristol, Bristol BS8 1TR, United Kingdom

Tóm tắt

We introduce a method of functionally classifying genes by using gene expression data from DNA microarray hybridization experiments. The method is based on the theory of support vector machines (SVMs). SVMs are considered a supervised computer learning method because they exploit prior knowledge of gene function to identify unknown genes of similar function from expression data. SVMs avoid several problems associated with unsupervised clustering methods, such as hierarchical clustering and self-organizing maps. SVMs have many mathematical features that make them attractive for gene expression analysis, including their flexibility in choosing a similarity function, sparseness of solution when dealing with large data sets, the ability to handle large feature spaces, and the ability to identify outliers. We test several SVMs that use different similarity metrics, as well as some other supervised learning methods, and find that the SVMs best identify sets of genes with a common function using expression data. Finally, we use SVMs to predict functional roles for uncharacterized yeast ORFs based on their expression data.

Từ khóa


Tài liệu tham khảo

10.1073/pnas.95.25.14863

10.1091/mbc.9.12.3273

10.1073/pnas.96.6.2907

V Vapnik Statistical Learning Theory (Wiley, New York, 1998).

10.1023/A:1009715923555

C Scholkopf, J C Burges, A J Smola Advances in Kernel Methods (MIT Press, Cambridge, MA, 1999).

R O Duda, P E Hart Pattern Classification and Scene Analysis (Wiley, New York, 1973).

C Bishop Neural Networks for Pattern Recognition (Oxford Univ. Press, New York, 1995).

J Quinlan Programs for Machine Learning, Series in Machine Learning (Morgan Kaufmann, San Francisco, 1997).

D Wu, K Bennett, N Cristianini, J Shawe-Taylor ICML99 (Morgan Kaufmann, San Francisco, 1999).

10.1073/pnas.94.24.13057

10.1126/science.278.5338.680

10.1091/mbc.9.12.3273

10.1126/science.282.5389.699

10.1109/78.650102

T Jaakkola, M Diekhans, D Haussler ISMB99 (AAAI Press, Menlo Park, CA), pp. 149–158 (1999).

10.1093/genetics/141.2.481

10.1073/pnas.95.5.2296

10.1111/j.1432-1033.1991.tb15775.x

10.1016/S0014-5793(97)01533-0

10.1074/jbc.270.29.17442

10.1101/gad.9.5.573

10.1016/S0014-5793(98)00084-2

10.1128/MCB.18.6.3149

10.1091/mbc.10.3.741

Garrett, Grisham Biochemistry (Saunders, Philadelphia), pp. 619–622 (1995).

10.1139/o95-101

10.1074/jbc.272.20.13372

10.1074/jbc.274.1.36

10.1093/nar/17.20.8367

10.1002/j.1460-2075.1994.tb06586.x

10.1128/MCB.18.12.7278

T Jaakkola, D Haussler NIPS 11 (Morgan Kaufmann, San Francisco), pp. 487–493 (1998).