Semi-automatic rule-based domain terminology and software feature-relevant information extraction from natural language user manuals
Tóm tắt
Mature software systems comprise a vast number of heterogeneous system capabilities which are usually requested by different groups of stakeholders and which evolve over time. Software features describe and bundle low level capabilities logically on an abstract level and thus provide a structured and comprehensive overview of the entire capabilities of a software system. Software features are often not explicitly managed. Quite the contrary, feature-relevant information is often spread across several software engineering artifacts (e.g., user manual, issue tracking systems). It requires huge manual effort to identify and extract feature-relevant information from these artifacts in order to make feature knowledge explicit. In this paper we present a two-step-approach to extract feature-relevant information from a user manual: First we semi-automatically extract a domain terminology from a natural language user manual based on linguistic patterns. Then, we apply natural language processing techniques based on the extracted domain terminology and structural sentence information. Our approach is able to extract atomic feature-relevant information with an F1-score of at least 92.00%. We describe the implementation of the approach as well as evaluations based on example sections of a user manual taken from industry.
Tài liệu tham khảo
Abney SP (2012) Parsing by chunks. In: Principle-based parsing, pp 257–278
Acher M, Cleve A, Perrouin G, Heymans P, Vanbeneden C, Collet P, Lahire P (2012) On extracting feature models from product descriptions. In: Proceedings of 6th International Workshop on Variability Modeling of Software-Intensive Systems (VaMoS’12), pp 45–54
Aggarwal C, Zhai C (2012) Mining Text Data
Alves V, Schwanninger C, Barbosa L, Rashid A, Sawyer P, Rayson P, Pohl C, Rummler A (2008) An exploratory study of information retrieval techniques in domain analysis. In: Proceedings of 12th International Software Product Line Conference (SPLC’08), pp 67–76
Apel S, Kästner C (2009) An Overview of Feature-Oriented Software Development. Obj Tec 8(5):49–84
Bakar NH, Kasirun ZM, Salleh N (2015a) Feature extraction approaches from natural language requirements for reuse in software product lines. Syst Softw 106 (C):132–149
Bakar NH, Kasirun ZM, Salleh N (2015b) Terms extractions: An approach for requirements reuse. In: 2nd Int. Conf. on Information Science and Security (ICISS), pp 1–4
Balachandran K, Ranathunga S (2016) Domain-specific term extraction for concept identification in ontology construction. In: International Conference on Web Intelligence (WI), pp 34–41
Beliga S, Meṡtrović A, Martinċić-Ipṡić S (2015) An overview of graph-based keyword extraction methods and approaches. J Inf Organ Sci 39(1):1–20
Berry DM (2017) Evaluation of Tools for Hairy Requirements and Software Engineering Tasks. In: Proceedings of the 25th Int. Requirements Engineering Conference Workshops (REW), pp 284–291
Berry DM, Gacitua R, Sawyer P, Tjong SF (2012) The case for dumb requirements engineering tools. In: Proceedings of the 18th International Conference on Requirements Engineering (REFSQ’12), pp 211–217
Bishop CM (2006) Pattern recognition and machine learning
Blanco R, Lioma C (2012) Graph-based term weighting for information retrieval. Inf Retr 15(1):54–92
Bosch J (2000) Design and use of software architectures: adopting and evolving a product-line approach
Boutkova E, Houdek F (2011) Semi-automatic identification of features in requirement specifications. In: Proceedings of 19th International Requirements Engineering Conference (RE’11), pp 313–318
Brinton LJ, Brinton D (2010) The linguistic structure of modern English
Chandrasekar R, Doran C, Srinivas B (1996) Motivations and methods for text simplification. In: Proceedings of the 16th Conference on Computational Linguistics (COLING), pp 1041–1044
Charniak E (1997) Statistical parsing with a context-free grammar and word statistics. AAAI/IAAI
Chen J, Chau R, Yeh CH (2004) Discovering parallel text from the world wide web. In: Proceedings of the 2nd Workshop on Australasian Information Security, Data Mining and Web Intelligence, and Software Internationalisation, pp 157–161
Chen K, Zhang W, Zhao H, Mei H (2005) An approach to constructing feature models based on requirements clustering. In: Proceedings of 13th International Requirements Engineering Conference (RE’05), pp 31–40
Chen P I, Lin S J (2010) Automatic keyword prediction using google similarity distance. Expert Syst Appl 37(3):1928–1938
Classen A, Heymans P, Schobbens PY (2008) What’s in a feature: A requirements engineering perspective. In: Proceedings of 11th International Conference on Fundamental Approaches to Software Engineering (FASE’08), pp 16–30
Conrado MS, Pardo TAS, Rezende SO (2014) The main challenge of semi-automatic term extraction methods. In: Proceedings of the 11th International Workshop on Natural Language Processing and Cognitive Science (NLPCS’14), pp 49–59
Corbett G (2006) Linguistic features. Encyclopedia of language and linguistics 2(7):193–194
Drymonas E, Zervanou K, Petrakis EGM (2010) Unsupervised ontology acquisition from plain texts: the OntoGain system. In: International Conference on Application of Natural Language to Information Systems, pp 277–287
Earls A, Embury S, Turner N (2002) A method for the manual extraction of business rules from legacy source code. BT Technology 20(4):127–145
Eisenbarth T, Koschke R, Simon D (2003) Locating features in source code. IEEE Trans Softw Eng 29(3):210–224
Ercan G, Cicekli I (2007) Using lexical chains for keyword extraction. Inf Process Manag 43(6):1705–1714
Gao X, Murugesan S, Lo B (2005) Extraction of keyterms by simple text mining for business information retrieval. In: Proceedings of the International Conference on e-Business Engineering (ICEBE’15), pp 332–339
Ghosh S, Elenius D, Li W, Lincoln P, Shankar N, Steiner W (2016) Arsenal: Automatic requirements specification extracting from natural language. In: Proceedings of 8th Int. Symp. of NASA Formal Methods (NFM’16), pp 41–46
Guzman E, Maalej W (2014) How do users like this feature? a fine grained sentiment analysis of app reviews. In: Proceedings of 22nd International Requirements Engineering Conference (RE’14), IEEE, pp 153–162
IEEE (1990) IEEE standard glossary of software engineering terminology. IEEE Std
Indurkhya N, Damerau FJ (2010) Handbook of natural language processing
John I, Dörr J (2003) Elicitation of requirements from user documentation. In: 9th International Workshop on Requirements Engineering (REFSQ’03), pp 17–26
Jonnalagadda S, Tari L, Hakenberg J, Baral C, Gonzalez G (2009) Towards effective sentence simplification for automatic processing of biomedical text. In: Proceedings of Human Language Technologies (HLT’09), pp 177–180
Kim S N, Baldwin T, Kan M Y (2009) An unsupervised approach to domain-specific term extraction. In: Australasian Language Technology Association Workshop, vol 2009, pp 94–98
Klein D, Manning C D (2003) Fast exact inference with a factored model for natural language parsing. In: Becker S, Thrun S, Obermayer K (eds) Advances in Neural Information Processing Systems, vol 15, pp 3–10
Kleinberg J M (1999) Authoritative sources in a hyperlinked environment. J ACM 46(5):604–632
Kof L (2009) Requirements analysis: concept extraction and translation of textual specifications to executable models, pp 79–90
Levy R, Andrew G (2006) Tregex and Tsurgeon: tools for querying and manipulating tree data structures. In: Proceedings of 5th International Conference on Language Resources and Evaluation (LREC’06), pp 2231–2234
Li Y, Guzman E, Tsiamoura K, Schneider F, Bruegge B (2015) Automated requirements extraction for scientific software. Procedia Comput Sci 51:582–591
Liu F, Pennell D, Liu F, Liu Y (2009) Unsupervised approaches for automatic keyword extraction using meeting transcripts. In: Proceedings of human language technologies: The 2009 annual Conf. of the North American chapter of the association for computational linguistics, pp 620–628
Lossio-Ventura JA, Jonquet C, Roche M, Teisseire M (2014a) Biomedical terminology extraction: A new combination of statistical and web mining approaches. In: JADT: Journées d’Analyse statistique des Données Textuelles, pp 421–432
Lossio-Ventura JA, Jonquet C, Roche M, Teisseire M (2014b) Yet another ranking function for automatic multiword term extraction. In: International Conference on Natural Language Processing (NLP’14), pp 52–64
Loughran N, Sampaio A, Rashid A (2006) From Requirements Documents to Feature Models for Aspect Oriented Product Line Implementation. In: Int. Conf. on Model Driven Engineering Languages and Systems, pp 262–271
Manning CD, Surdeanu M, Bauer J, Finkel JR, Bethard S, McClosky D (2014) The stanford corenlp natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations
Marciuska S, Gencel C, Abrahamsson P (2014) Automated feature identification in web applications. In: Proceedings of 14th International Conference on Software Quality (QSIC’14), pp 100–114
Meijer K, Frasincar F, Hogenboom F (2014) A semantic approach for extracting domain taxonomies from text. Decis Support Syst 62:78–93
Melville P, Gryc W, Lawrence RD (2009) Sentiment analysis of blogs by combining lexical knowledge with text classification. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge discovery and data mining, pp 1275–1284
Merten T, Falis M, Hübner P, Quirchmayr T, Bürsner S, Paech B (2016) Software feature request detection in issue tracking systems. In: Proceedings of 24th Int. Requirements Engineering Conference (RE’16), pp 166–175
Mu Y, Wang Y, Guo J (2009) Extracting software functional requirements from free text documents. In: Proceedings of 1st International Conference on Information and Multimedia Technology (ICIMT’09), pp 194–198
Nixon M (2008) Feature extraction & image processing
Paech B, Hübner P, Merten T (2014) What Are the Features of This Software? An Exploratory Study. In: Proceedings of 9th International Conference on Software Engineering Advances (ICSEA’14), pp 114–125
Pikkarainen M, Haikara J, Salo O, Abrahamsson P, Still J (2008) The impact of agile practices on communication in software development. J Empir Softw Eng 13(3):303–337
Quirchmayr T, Paech B, Kohl R, Karey H (2017) Semi-automatic software feature-relevant information extraction from natural language user manuals. In: Proceedings of the 23rd International Conference on Requirements Engineering (REFSQ’17), Springer, pp 255–272
Rose S, Engel D, Cramer N, Cowley W (2010) Automatic keyword extraction from individual documents. In: Berry MW, Kogan J (eds) Text Mining: Applications and Theory, pp 1–20
Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47
Shaker P, Atlee JM, Wang S (2012) A feature-oriented requirements modelling language. In: Proceedings of 20th International Requirements Engineering Conference (RE’12), pp 151–160
da Silva Conrado M, Pardo TAS, Rezende SO (2013) A machine learning approach to automatic term extraction using a rich feature set. In: HLT-NAACL, pp 16–23
Venu SH, Mohan V, Urkalan K, Geetha TV (2016) Unsupervised domain ontology learning from text. In: International Conference on Mining Intelligence and Knowledge Exploration, pp 132–143
Ward LJ, Woods G (2013) English grammar for dummies
Weston N, Chitchyan R, Rashid A (2009) A framework for constructing semantically composable feature models from natural language requirements. In: Proceedings of the 13th International Software Product Line Conf. (SPLC’09), pp 211–220
Wimalasuriya DC, Dou D (2010) Ontology-based information extraction: An introduction and a survey of current approaches. Inf Sci 36(3):306–323
Wong W, Liu W, Bennamoun M (2012) Ontology learning from text: A look back and into the future. ACM Comput Surv 44(4):20
Zapata JCM, Losada BM, Gonzalez-Calderon G (2012) An approach for using procedure manuals as a source for requirements elicitation. In: Proceedings of 38th Conf. Latinoamericana En Informatica (CLEI’12), pp 1–8
Zhang K, Xu H, Tang J, Li J (2006) Keyword extraction using support vector machine. In: International Conference on Web-Age Information Management, pp 85–96
Zorn-Pauli G, Paech B, Wittkopf J (2012) Strategic release planning challenges for global information systems - a position paper. In: Proceedings of 6th International Workshop on Software Product Management (IWSPM’12), pp 186–191