Efficient string matching

Communications of the ACM - Tập 18 Số 6 - Trang 333-340 - 1975
Alfred V. Aho1,2, Margaret J. Corasick1,2
1Bell Laboratories, Murray Hill, N.J. 07974.
2The MITRE Corporation, Bedford, Mass. 01730.

Tóm tắt

This paper describes a simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text. The algorithm consists of constructing a finite state pattern matching machine from the keywords and then using the pattern matching machine to process the text string in a single pass. Construction of the pattern matching machine takes time proportional to the sum of the lengths of the keywords. The number of state transitions made by the pattern matching machine in processing the text string is independent of the number of keywords. The algorithm has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.

Từ khóa


Tài liệu tham khảo

Aho , A.V. Hopscroft , J,E . and Ullman , D. J. , the Design and Analysis of Computer Algorithms . Addison-Wesley , Reading, Mass ., 1974 . Aho, A.V.Hopscroft, J,E. and Ullman, D. J., the Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, Mass., 1974.

Booth T.U Sequential Machines and Automata Theory. Wiley New York 1967. Booth T.U Sequential Machines and Automata Theory. Wiley New York 1967.

10.1145/321239.321249

Bullen , R.H., Jr. , and Millen , J.K . Microtext - the design of a microprogrammed finite state search machine for full-text retrieval . Proc. Fall Joint Computer Conference , 1972 , pp. 479 - 488 . Bullen, R.H., Jr., and Millen, J.K. Microtext - the design of a microprogrammed finite state search machine for full-text retrieval. Proc. Fall Joint Computer Conference, 1972, pp. 479-488.

10.1145/361952.361960

10.1145/362919.362934

10.1145/364175.364185

10.1145/360680.360684

Kleene , S.C. Representation of events in nerve nets . In Automata Studies, C.E. Shannon and J. McCarthy (eds.), Princeton University Press , 1956 , pp. 3 - 40 . Kleene, S.C. Representation of events in nerve nets. In Automata Studies, C.E. Shannon and J. McCarthy (eds.), Princeton University Press, 1956, pp. 3-40.

Knuth , D.E. Fundamental Algorithms , second edition, The Art of Computer Programming 1, Addison-Wesley , Reading, Mass., 1973 . Knuth, D.E. Fundamental Algorithms, second edition, The Art of Computer Programming 1, Addison-Wesley, Reading, Mass., 1973.

Knuth , D.E. Sorting and Searching , The Art of Computer Prograining 3 , Addison-Wesley , Reading, Mass ., 1973 . Knuth, D.E. Sorting and Searching, The Art of Computer Prograining 3, Addison-Wesley, Reading, Mass., 1973.

Knuth , D.E. , Morris , J.H., Jr. , and Pratt , V.R . Fast pattern matching in strings. TR CS-74-440 , Stanford University , Stanford, California , 1974 . Knuth, D.E., Morris, J.H., Jr., and Pratt, V.R. Fast pattern matching in strings. TR CS-74-440, Stanford University, Stanford, California, 1974.

Kohavi , Z. Switching and Finite Automata Theory . McGraw- Hill , New York , 1970 . Kohavi, Z. Switching and Finite Automata Theory. McGraw- Hill, New York, 1970.

10.1109/TEC.1960.5221603

10.1147/rd.32.0114

10.1145/363347.363387