Learning Information Extraction Rules for Semi-Structured and Free Text

Machine Learning - Tập 34 - Trang 233-272 - 1999
Stephen Soderland1
1Department Computer Science and Engineering, University of Washington, Seattle

Tóm tắt

A wealth of on-line text information can be made available to automatic processing by information extraction (IE) systems. Each IE application needs a separate set of rules tuned to the domain and writing style. WHISK helps to overcome this knowledge-engineering bottleneck by learning text extraction rules automatically. WHISK is designed to handle text styles ranging from highly structured to free text, including text that is neither rigidly formatted nor composed of grammatical sentences. Such semi-structured text has largely been beyond the scope of previous systems. When used in conjunction with a syntactic analyzer and semantic tagging, WHISK can also handle extraction from free text such as news stories.

Tài liệu tham khảo