Task estimation for software company employees based on computer interaction logs
Tóm tắt
Digital tools and services collect a growing amount of log data. In the software development industry, such data are integral and boast valuable information on user and system behaviors with a significant potential of discovering various trends and patterns. In this study, we focus on one of those potential aspects, which is task estimation. In that regard, we perform a case study by analyzing computer recorded activities of employees from a software development company. Specifically, our purpose is to identify the task of each employee. To that end, we build a hierarchical framework with a 2-stage recognition and devise a method relying on Bayesian estimation which accounts for temporal correlation of tasks. After pre-processing, we run the proposed hierarchical scheme to initially distinguish infrequent and frequent tasks. At the second stage, infrequent tasks are discriminated between them such that the task is identified definitively. The higher performance rate of the proposed method makes it favorable against the association rule-based methods and conventional classification algorithms. Moreover, our method offers significant potential to be implemented on similar software engineering problems. Our contributions include a comprehensive evaluation of a Bayesian estimation scheme on real world data and offering reinforcements against several challenges in the data set (samples with different measurement scales, dependence characteristics, imbalance, and with insignificant pieces of information).
Tài liệu tham khảo
ABB Inc (2017) ABB Dev Interaction Data. https://abb-iss.github.io/DeveloperInteractionLogs/
Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Record 22(2):207–216. https://doi.org/10.1145/170036.170072
Ahmed A (2016) Software project management: A process-driven approach. Auerbach Publications
Alemdar H, van Kasteren T, Ersoy C (2017) Active learning with uncertainty sampling for large scale activity recognition in smart homes. J Ambient Intell Smart Environ 9(2):209–223
Alpaydin E (2016) Machine learning: The new AI. MIT press
Amlekar R, Gamboa AFR, Gallaba K, McIntosh S (2018) Do software engineers use autocompletion features differently than other developers? In: International Conference on Mining Software Repositories. IEEE, pp 86–89
Anand K, Kumar J, Anand K (2017) Anomaly detection in online social network: A survey. In: Proceedings of International Conference on Inventive Communication and Computational Technologies. IEEE, pp 456–459
Bao L, Xing Z, Xia X, Lo D, Hassan AE (2018) Inference of development activities from interaction with uninstrumented applications. Empir Softw Eng 23(3):1313–1351
Beller M, Gousios G, Panichella A, Proksch S, Amann S, Zaidman A (2017) Developer testing in the IDE: patterns, beliefs, and behavior. IEEE Trans Softw Eng 45(3):261–284
Bernardi S, JL Domínguez, Gómez A, Joubert C, Merseguer J, Perez-Palacin D, Requeno J I, Romeu A (2018) A systematic approach for performance assessment using process mining. Empir Softw Eng 23 (6):3394–3441
Bogarín A, Cerezo R, Romero C (2018) A survey on educational process mining. Wiley Interdiscip Rev Data Min Knowl Discov 8(1):e1230
Brdiczka O (2010) From documents to tasks: Deriving user tasks from document usage patterns. In: Proceedings of International Conference on Intelligent User Interfaces. ACM, pp 285–288
Caballé S, Xhafa F (2013) Distributed-based massive processing of activity logs for efficient user modeling in a virtual campus. Clust Comput 16 (4):829–844
Caldeira J, e Abreu FB, Reis J, Cardoso J (2019) Assessing software development teams’ efficiency using process mining. In: Proceedings of International Conference on Process Mining. IEEE, pp 65–72
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: Synthetic Minority over-sampling technique. J Artif Intell Res 16:321–357
Chen L, Nugent CD (2019) Sensor-based activity recognition review. In: Human Activity Recognition and Behaviour Analysis. Springer, pp 23–47
Chernov S (2008) Task detection for activity-based desktop search. In: Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, pp 894–894
Chernov S, Demartini G, Herder E, Kopycki M, Nejdl W (2008) Evaluating personal information management using an activity logs enriched desktop dataset. In: Proceedings of Personal Information Management Workshop, vol 155. Citeseer
Choi H, Lim J, Yu H, Lee E (2016) Task classification based energy-aware consolidation in clouds. Sci Program 2016
Coman ID (2007) An analysis of developers’ tasks using low-level, automatically collected data. In: Joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp 579–582
Damevski K, Shepherd DC, Schneider J, Pollock L (2016) Mining sequences of developer interactions in visual studio for usage smells. IEEE Trans Softw Eng 43(4):359–371
Deisenroth MP, Faisal AA, Ong CS (2020) Mathematics for machine learning. Cambridge University Press
Delias P, Doumpos M, Grigoroudis E, Manolitzas P, Matsatsinis N (2015) Supporting healthcare management decisions via robust clustering of event logs. Knowl-Based Syst 84:203–213
Devaurs D, Rath AS, Lindstaedt SN (2012) Exploiting the user interaction context for automatic task detection. Appl Artif Intell 26(1-2):58–80
Dingsøyr T, Fægri TE, Dybå T, Haugset B, Lindsjørn Y (2016) Team performance in software development: Research results versus agile principles. IEEE Softw 33(4):106–110
Dragunov AN, Dietterich TG, Johnsrude K, McLaughlin M, Li L, Herlocker JL (2005) TaskTracer: A desktop environment to support multi-tasking knowledge workers. In: Proceedings of International Conference on Intelligent User Interfaces. ACM, pp 75–82
Eclipse Foundation (2010) Filtered UDC Data. http://archive.eclipse.org/projects/usagedata/
Embrechts P, Hofert M (2013) A note on generalized inverses. Math Methods Oper Res 77(3):423–432
Fernández A, Garcia S, Herrera F, Chawla NV (2018) Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J Artif Intell Res 61:863–905
Forsati R, Moayedikia A, Shamsfard M (2015) An effective web page recommender using binary data clustering. Inf Retriev J 18(3):167–214
Gatta R, Vallati M, Lenkowicz J, Casà C, Cellini F, Damiani A, Valentini V (2018) A framework for event log generation and knowledge representation for process mining in healthcare. In: Proceedings of International Conference on Tools with Artificial Intelligence. IEEE, pp 647–654
Hakim A, Hasibuan M, Andreswari R (2019) E-learning process analysis to determining student learning patterns using process mining approach 1193:1–8
Harris D, Harris S (2010) Digital design and computer architecture. Morgan Kaufmann
Hochstein L, Basili VR, Zelkowitz MV, Hollingsworth JK, Carver J (2005) Combining self-reported and automatic data to improve programming effort measurement. ACM SIGSOFT Softw Eng Notes 30(5):356–365
Jalali A (2016) Supporting social network analysis using chord diagram in process mining. In: Proceedings of International Conference on Business Informatics Research. Springer, pp 16–32
Jalote P, Kamma D (2019) Studying task processes for improving programmer productivity. IEEE Transactions on Software Engineering
Johnson PM (2007) Requirement and design trade-offs in Hackystat: An in-process software engineering measurement and analysis system. In: Proceedings of International Symposium on Empirical Software Engineering and Measurement. IEEE, pp 81–90
Johnson PM, Kou H, Agustin J, Chan C, Moore C, Miglani J, Zhen S, Doane WE (2003) Beyond the personal software process: Metrics collection and analysis for the differently disciplined. In: Proceedings of the International Conference on Software Engineering. IEEE, pp 641–646
Kalenkova AA, van der Aalst WM, Lomazova IA, Rubin VA (2017) Process mining using BPMN: relating event logs and process models. Softw Syst Model 16(4):1019–1048
Karahasanović A, Heim J (2015) Understanding the behaviour of online TV users. Pers Ubiquit Comput 19(5-6):839–852
KaVe Project (2018) Datasets. https://www.kave.cc/datasets
Ko AJ, DeLine R, Venolia G (2007) Information needs in collocated software development teams. In: Proceedings of International Conference on Software Engineering. IEEE, pp 344–353
Koldijk S, Van Staalduinen M, Neerincx M, Kraaij W (2012) Real-time task recognition based on knowledge workers’ computer activities. In: Proceedings of European Conference on Cognitive Ergonomics, pp 152–159
Langhnoja S, Barot M, Mehta D (2012) Pre-processing: Procedure on web log file for web usage mining. Int J Emerging Technol Adv Eng 2(12):419–423
Leemans M, van der Aalst WM, van den Brand MG (2018) The Statechart workbench: Enabling scalable software event log analysis using process mining. In: Proceedings of International Conference on Software Analysis, Evolution and Reengineering. IEEE, pp 502–506
Maalej W, Ellmann M, Robbes R (2017) Using contexts similarity to predict relationships between tasks. J Syst Softw 128:267–284
MacKay DJ (2003) Information Theory, Inference and Learning Algorithms. Cambridge University Press
Martin N, Solti A, Mendling J, Depaire B, Caris A (2019) Mining batch activation rules from event logs. IEEE Trans Serv Comput:1–1. https://doi.org/10.1109/TSC.2019.2912163
Mazza R, Bettoni M, Faré M, Mazzola L (2012) MOCLog - monitoring online courses with log data. In: Proceedings of the Moodle Research Conference, pp 132–139
McLeod L, MacDonell SG (2011) Factors that affect software systems development project outcomes: a survey of research. ACM Comput Surv (CSUR) 43 (4):24
Meyer AN, Barton LE, Murphy GC, Zimmermann T, Fritz T (2017) The work life of developers: activities, switches and perceived productivity. IEEE Trans Softw Eng 43(12):1178–1193
Meyer AN, Satterfield C, Züger M, Kevic K, Murphy GC, Zimmermann T, Fritz T (2020) Detecting developers’ task switches and types. IEEE Trans Softw Eng:1–16
Mirza HT, Chen L, Hussain I, Majid A, Chen G (2015) A study on automatic classification of users’ desktop interactions. Cybern Syst 46(5):320–341
Monden A, Matsumura T, Barker M, Torii K, Basili VR (2012) Customizing GQM models for software project monitoring. IEICE Trans Inf Syst 95(9):2169–2182
Montgomery DC, Runger GC (2010) Applied statistics and probability for engineers. Wiley
Obregon J, Song M, Jung JY (2019) Infoflow: Mining information flow based on user community in social networking services. IEEE Access 7:48024–48036
Oram A, Wilson G (2010) Making software: What really works, and why we believe it. O’Reilly Media Inc
Parsons HM (1974) What Happened at Hawthorne?: New evidence suggests the Hawthorne effect resulted from operant reinforcement contingencies. Science 183(4128):922–932
Partington A, Wynn M, Suriadi S, Ouyang C, Karnon J (2015) Process mining for clinical processes: a comparative analysis of four australian hospitals. ACM Trans Manag Inf Syst 5(4):19
Perry DE, Staudenmayer NA, Votta LG (1995) Understanding and improving time usage in software development. Softw Process 5:111–135
Proksch S, Nadi S, Amann S, Mezini M (2017) Enriching in-ide process information with fine-grained source code history. In: Proceedings of International Conference on Software Analysis, Evolution and Reengineering. IEEE, pp 250–260
Ramachandran KM, Tsokos CP (2014) Mathematical Statistics with Applications in R. Elsevier
Rashid T, Agrafiotis I, Nurse J (2016) A new take on detecting insider threats: Exploring the use of hidden markov models. In: Proceedings of ACM CCS International Workshop on Managing Insider Security Threats, pp 47–56. https://doi.org/10.1145/2995959.2995964
Rojas E, Munoz-Gama J, Sepúlveda M, Capurro D (2016) Process mining in healthcare: a literature review. J Biomed Inform 61:224–236
Rovani M, Maggi FM, de Leoni M, van der Aalst WM (2015) Declarative process mining in healthcare. Expert Syst Appl 42(23):9236–9251
Rovetta S, Cabri A, Masulli F, Suchacka G (2017) Bot or not? A case study on bot recognition from Web session logs. In: Italian Workshop on Neural Nets. Springer, pp 197–206
Russo B, Succi G, Pedrycz W (2015) Mining system logs to learn error predictors: a case study of a telemetry system. Empir Softw Eng 20(4):879–927
Schönig S, Cabanillas C, Jablonski S, Mendling J (2015) Mining the organisational perspective in agile business processes. In: Enterprise, Business-Process and Information Systems Modeling. Springer, pp 37–52
Shen J, Li L, Dietterich TG, Herlocker JL (2006) A hybrid learning system for recognizing user tasks from desktop activities and email messages. In: Proceedings of International Conference on Intelligent User Interfaces. ACM, pp 86–92
Shen J, Li L, Dietterich T G (2007) Real-time detection of task switches of desktop users. In: Proceedings of International Joint Conferences on Artificial Intelligence, vol 7, pp 2868–2873
Shimizu R, Monden A, Yücel Z, Uwano H (2018) Automatic estimation of software development tasks. In: Proceedings of IPSJ/SIGSE Winter Workshop, vol 2018, pp 30–31
Singh V, Pollock LL, Snipes W, Kraft NA (2016) A case study of program comprehension effort and technical debt estimations. In: International Conference on Program Comprehension. IEEE, pp 1–9
Soto-Valero C, Bourcier J, Baudry B (2018) Detection and analysis of behavioral t-patterns in debugging activities. In: Proceedings of International Conference on Mining Software Repositories, pp 110–113
Suthipornopas P, Leelaprute P, Monden A, Uwano H, Kamei Y, Ubayashi N, Araki K, Yamada K, Matsumoto K (2017) Industry application of software development task measurement system: Taskpit. IEICE Transactions on Information and Systems (3):462–472
Tax N, Sidorova N, Haakma R, van der Aalst WM (2016) Event abstraction for process mining using supervised learning techniques. In: Proceedings of SAI Intelligent Systems Conference. Springer, pp 251–269
van der Aalst WM (2015) Extracting event data from databases to unleash process mining. In: BPM-Driving Innovation in a Digital World, Springer, pp 105–128
Vialardi C, Bravo agapito J, Ortigosa A (2008) Improving AEH courses through log analysis. Journal of Universal Computer Science
Viertel FP, Karras O, Schneider K (2017) Vulnerability recognition by execution trace differentiation. Softwaretechnik-Trends 37(3), http://pi.informatik.uni-siegen.de/stt/37_3/01_Fachgruppenberichte/SSP2017_proceedings/01_Vulnerability_Recognition_by_Execution_Trace_Differentiation.pdf
Vijayasarathy LR, Butler CW (2015) Choice of software development methodologies: Do organizational, project, and team characteristics matter? IEEE Softw 33(5):86–94
Vuong T, Jacucci G, Ruotsalo T (2017) Watching inside the screen: Digital activity monitoring for task recognition and proactive information retrieval. Proceedings of the ACM on Interactive, Mobile. Wear Ubiquit Technol 1(3):1–23
Wagner S, Ruhe M (2018) A systematic review of productivity factors in software development. arXiv:180106475
Wickramasinghe V, Nandula S (2015) Diversity in team composition, relationship conflict and team leader support on globally distributed virtual software development team performance. Strategic Outsourcing Int J 8(2/3):138–155
Yücel Z (2020a) Software applications and custom codes. https://github.com/yucelzeynep/Task-estimation-from-activity-logs, 2020-08-09
Yücel Z (2020b) Supplemental material on detailed results of alternative methods. https://yucelzeynep.github.io/pub/2020_supp_mat_std_clsf.pdf, 2020-07-09
Yücel Z (2020c) Supplemental material on detailed results of the proposed method. https://yucelzeynep.github.io/pub/2020_supp_mat_proposed.pdf, 2020-07-09
Yücel Z (2021) Interaction logs of sofware company employees for task estimation. https://doi.org/10.5281/zenodo.4500028
Zou L, Godfrey MW (2012) An industrial case study of Coman’s automated task detection algorithm: What worked, what didn’t, and why. In: Proceedings of IEEE International Conference on Software Maintenance. IEEE, pp 6–14
