Empirical Software Engineering

Công bố khoa học tiêu biểu

* Dữ liệu chỉ mang tính chất tham khảo

Sắp xếp:  
How developers engage with static analysis tools in different contexts
Empirical Software Engineering - Tập 25 - Trang 1419-1457 - 2019
Carmine Vassallo, Sebastiano Panichella, Fabio Palomba, Sebastian Proksch, Harald C. Gall, Andy Zaidman
Automatic static analysis tools (ASATs) are instruments that support code quality assessment by automatically detecting defects and design issues. Despite their popularity, they are characterized by (i) a high false positive rate and (ii) the low comprehensibility of the generated warnings. However, no prior studies have investigated the usage of ASATs in different development contexts (e.g., code reviews, regular development), nor how open source projects integrate ASATs into their workflows. These perspectives are paramount to improve the prioritization of the identified warnings. To shed light on the actual ASATs usage practices, in this paper we first survey 56 developers (66% from industry and 34% from open source projects) and interview 11 industrial experts leveraging ASATs in their workflow with the aim of understanding how they use ASATs in different contexts. Furthermore, to investigate how ASATs are being used in the workflows of open source projects, we manually inspect the contribution guidelines of 176 open-source systems and extract the ASATs’ configuration and build files from their corresponding GitHub repositories. Our study highlights that (i) 71% of developers do pay attention to different warning categories depending on the development context; (ii) 63% of our respondents rely on specific factors (e.g., team policies and composition) when prioritizing warnings to fix during their programming; and (iii) 66% of the projects define how to use specific ASATs, but only 37% enforce their usage for new contributions. The perceived relevance of ASATs varies between different projects and domains, which is a sign that ASATs use is still not a common practice. In conclusion, this study confirms previous findings on the unwillingness of developers to configure ASATs and it emphasizes the necessity to improve existing strategies for the selection and prioritization of ASATs warnings that are shown to developers.
Using black-box performance models to detect performance regressions under varying workloads: an empirical study
Empirical Software Engineering - - 2020
Lizhi Liao, Jinfu Chen, Heng Li, Yi Zeng, Weiyi Shang, Jianmei Guo, Catalin Sporea, Andrei Toma, Sarah Sajedi
Performance regressions of large-scale software systems often lead to both financial and reputational losses. In order to detect performance regressions, performance tests are typically conducted in an in-house (non-production) environment using test suites with predefined workloads. Then, performance analysis is performed to check whether a software version has a performance regression against an earlier version. However, the real workloads in the field are constantly changing, making it unrealistic to resemble the field workloads in predefined test suites. More importantly, performance testing is usually very expensive as it requires extensive resources and lasts for an extended period. In this work, we leverage black-box machine learning models to automatically detect performance regressions in the field operations of large-scale software systems. Practitioners can leverage our approaches to complement or replace resource-demanding performance tests that may not even be realistic in a fast-paced environment. Our approaches use black-box models to capture the relationship between the performance of a software system (e.g., CPU usage) under varying workloads and the runtime activities that are recorded in the readily-available logs. Then, our approaches compare the black-box models derived from the current software version with an earlier version to detect performance regressions between these two versions. We performed empirical experiments on two open-source systems and applied our approaches on a large-scale industrial system. Our results show that such black-box models can effectively and timely detect real performance regressions and injected ones under varying workloads that are unseen when training these models. Our approaches have been adopted in practice to detect performance regressions of a large-scale industry system on a daily basis.
Replicating studies on cross- vs single-company effort models using the ISBSG Database
Empirical Software Engineering - Tập 13 Số 1 - Trang 3-37 - 2008
Emilia Mendes, Chris Lokan
Usage and attribution of Stack Overflow code snippets in GitHub projects
Empirical Software Engineering - Tập 24 Số 3 - Trang 1259-1295 - 2019
Baltes, Sebastian, Diehl, Stephan
Stack Overflow (SO) is the most popular question-and-answer website for software developers, providing a large amount of copyable code snippets. Using those snippets raises maintenance and legal issues. SO’s license (CC BY-SA 3.0) requires attribution, i.e., referencing the original question or answer, and requires derived work to adopt a compatible license. While there is a heated debate on SO’s license model for code snippets and the required attribution, little is known about the extent to which snippets are copied from SO without proper attribution. We present results of a large-scale empirical study analyzing the usage and attribution of non-trivial Java code snippets from SO answers in public GitHub (GH) projects. We followed three different approaches to triangulate an estimate for the ratio of unattributed usages and conducted two online surveys with software developers to complement our results. For the different sets of projects that we analyzed, the ratio of projects containing files with a reference to SO varied between 3.3% and 11.9%. We found that at most 1.8% of all analyzed repositories containing code from SO used the code in a way compatible with CC BY-SA 3.0. Moreover, we estimate that at most a quarter of the copied code snippets from SO are attributed as required. Of the surveyed developers, almost one half admitted copying code from SO without attribution and about two thirds were not aware of the license of SO code snippets and its implications.
Conference Report: The Seventh Empirical Studies of Programmers Workshop
Empirical Software Engineering - Tập 3 Số 2 - Trang 213-216 - 1998
Wiedenbeck, Susan
Effort estimation of FLOSS projects: a study of the Linux kernel
Empirical Software Engineering - - 2013
Andrea Capiluppi, Daniel Izquierdo-Cortázar
Empirical study of android repackaged applications
Empirical Software Engineering - Tập 24 - Trang 3587-3629 - 2019
Kobra Khanmohammadi, Neda Ebrahimi, Abdelwahab Hamou-Lhadj, Raphaël Khoury
The growing popularity of Android applications has generated increased concerns over the danger of piracy and the spread of malware, and particularly of adware: malware that seeks to present unwanted advertisements to the user. A popular way to distribute malware in the mobile world is through repackaging of legitimate apps. This process consists of downloading, unpacking, manipulating, recompiling an application, and publishing it again in an app store. In this paper, we conduct an empirical study of over 15,000 apps to gain insights into the factors that drive the spread of repackaged apps. We also examine the motivations of developers who publish repackaged apps and those of users who download them, as well as the factors that determine which apps are chosen for repackaging, and the ways in which the apps are modified during the repackaging process. Having observed that adware is particularly prevalent in repackaged apps, we focus on this type of malware and examine how the app is modified when it is injected in an app’s code. Our findings shed much needed light on this class of malware that can be useful to security experts, and allow us to make recommendations that could lead to the creation of more effective malware detection tools, Furthermore, on the basis of our results, we propose a novel app indexing scheme that minimizes the number of comparisons needed to detect repackaged apps.
Reply to ''Comments to the Paper: Briand, El Emam, Morasca: On the Application of Measurement Theory in Software Engineering''
Empirical Software Engineering - Tập 2 - Trang 317-322 - 1997
Lionel Briand, Khaled El Emam, Sandro Morasca
Semi-automatic rule-based domain terminology and software feature-relevant information extraction from natural language user manuals
Empirical Software Engineering - Tập 23 - Trang 3630-3683 - 2018
Thomas Quirchmayr, Barbara Paech, Roland Kohl, Hannes Karey, Gunar Kasdepke
Mature software systems comprise a vast number of heterogeneous system capabilities which are usually requested by different groups of stakeholders and which evolve over time. Software features describe and bundle low level capabilities logically on an abstract level and thus provide a structured and comprehensive overview of the entire capabilities of a software system. Software features are often not explicitly managed. Quite the contrary, feature-relevant information is often spread across several software engineering artifacts (e.g., user manual, issue tracking systems). It requires huge manual effort to identify and extract feature-relevant information from these artifacts in order to make feature knowledge explicit. In this paper we present a two-step-approach to extract feature-relevant information from a user manual: First we semi-automatically extract a domain terminology from a natural language user manual based on linguistic patterns. Then, we apply natural language processing techniques based on the extracted domain terminology and structural sentence information. Our approach is able to extract atomic feature-relevant information with an F1-score of at least 92.00%. We describe the implementation of the approach as well as evaluations based on example sections of a user manual taken from industry.
Challenges in software model reuse: cross application domain vs. cross modeling paradigm
Empirical Software Engineering - Tập 29 - Trang 1-25 - 2023
Iris Reinhartz-Berger
Software reuse is a common practice that aims to reduce costs and effort, while improving quality and productivity. However, it also raises challenges of retrieving existing artifacts and adapting them to the given context. Sometimes, the most relevant artifacts are realized in a different application domain and/or in a different paradigm (e.g., object-oriented vs. data-driven). These challenges are extremely relevant to non-code artifacts, such as models, which are relatively rare, as well as vary in the level of details and quality. In this paper, we aim to explore the challenges and opportunities of cross application domain and cross modeling paradigm model reuse. These types of reuse require different mapping mechanisms (analogy creation and transformation, respectively), but similar adaptation operations (use-as-is, modification, omission and addition). To explore the challenges of these reuse types, we present the design and the results of a series of controlled experiments, involving 64 participants, which analyzed correctness of software model reuse across application domains and across two modeling paradigms: object-oriented expressed in UML use case and class diagrams and data-driven expressed in entity-relationship and data flow diagrams. Our results show that overall cross-domain reuse is more correctly performed than cross-paradigm reuse, especially with respect to addition. We further analyzed the challenges in each reuse type and found that modification and addition in both reuse types are quite challenging and require careful support to meet new or differing requirements.
Tổng số: 1,121   
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 10