Improvements on a semi-automatic grammar induction framework

Chin-Chung Wong1, H. Meng1
1Human-Computer Communications Laboratory, Department of Systems Engineering and Engineering Management, Chinese University of Hong Kong, New Territories, Hong Kong, China

Tóm tắt

This work extends the semi-automatic grammar induction approach previously proposed (see Meng, H. and Siu, K.C., IEEE Trans. on Knowledge and Data Engineering). The data-driven approach learns semantic and phrasal categories from a training corpus of unannotated natural language queries in a specific domain. The approach can be seeded with prespecified semantic categories to expedite the learning process. Grammar rules are automatically acquired by an agglomerative clustering procedure, and the resulting grammar may be hand-edited easily for refinement. This work attempts to improve the grammar induction framework by leveraging information in the SQL query that accompanies every training query. The SQL expression specifies the action of database access in relation to the query, and hence provides information about meaningful natural language structures that should to be captured in induced grammar. We have also incorporated the use of information gain in place of mutual information to capture phrasal structures, as well as the determination of an automatic stopping criterion for agglomerative clustering.

Từ khóa

#Natural languages #Testing #Equations #Databases #Laboratories #Systems engineering and theory #Research and development management #Scalability #Humans #Speech recognition

Tài liệu tham khảo

wu, 1993, Corpus-based Automatic Compound Extraction with Mutual lnformarion and Relative frequency Count, Proceedings of R O C Computational Linguistics Conference VI akiba, 2000, Semi-Automatic Language Model Acquisition without Large Corpora, Proceedings of the ICSLP potamianos, 2000, Statistical Recursive Finite state Machine Parsing for Speech Understanding, Proceedings of the ICSLP manning, 1999, Foundations of Statistical Natural Language Processing rijsbergen, 1979, Information Retrieval thompson, 1999, Active Learning for Natural Language Parsing and Information Extraction, Proceedings of the ICML mccandless, 1993, Empirical Acquisition of Word and Phrases Classes in the ATIS Domain, The 3rd European Conference on Speech Communication and Technology meng, 0, Semi-Automatic Acquisition of Domain-Specific Semantic Structures, IEEE Trans on Knowledge and Data Engineering