LADRA: Log-based abnormal task detection and root-cause analysis in big data processing with Spark
Tài liệu tham khảo
Dean, 2008, Mapreduce: simplified data processing on large clusters, Commun. ACM, 51, 107, 10.1145/1327452.1327492
Apache Spark website, http://Spark.apache.org/.
Apache Hadoop website, http://hadoop.apache.org/.
Zaharia, 2012, Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing
Zhang, 2017, Mrapid: An efficient short job optimizer on hadoop, 459
Wang, 2009, Atomicity and provenance support for pipelined scientific workflows, Future Gener. Comput. Syst., 25, 568, 10.1016/j.future.2008.06.007
Subramanian, 2010, Rapid processing of synthetic seismograms using windows azure cloud
Subramanian, 2011, Rapid 3d seismic source inversion using windows azure and amazon ec2
M. Zaharia, A. Konwinski, A.D. Joseph, R.H. Katz, I. Stoica, Improving mapreduce performance in heterogeneous environments, in: Osdi, vol. 8, 2008, p. 7.
Lu, 2017, Log-based abnormal task detection and root cause analysis for spark, 389
G. Ananthanarayanan, S. Kandula, A.G. Greenberg, I. Stoica, Y. Lu, B. Saha, E. Harris, Reining in the outliers in map-reduce clusters using mantri, in: OSDI, vol. 10, 2010, p. 24.
Ibidunmoye, 2015, Performance anomaly detection and bottleneck identification, ACM Comput. Surv., 48, 4, 10.1145/2791120
P. Garraghan, X. Ouyang, R. Yang, D. McKee, J. Xu, Straggler root-cause and impact analysis for massive-scale virtualized cloud datacenters, IEEE Transactions on Services Computing.
Jayathilaka, 2017, Performance monitoring and root cause analysis for cloud-hosted web applications, 469
Chen, 2002, Pinpoint: Problem determination in large, dynamic internet services, 595
Gu, 2009, Online anomaly prediction for robust cluster systems, 1000
Oliner, 2007, What supercomputers say: A study of five system logs
Ryza, 2015
Tan, 2008, Salsa: Analyzing logs as state machines, WASL, 8
Tan, 2010, Visual, log-based causal tracing for performance debugging of mapreduce systems, 795
Chen, 2010, Samr: A self-adaptive mapreduce scheduling algorithm in heterogeneous environment, 2736
Xu, 2009, Detecting large-scale system problems by mining console logs
Qi, 2017, Data mining based root-cause analysis of performance bottleneck for big data workload, 254
Fulp, 2008, Predicting computer system failures using support vector machines, WASL, 8
Yadwadkar, 2014, Wrangler: Predictable and faster jobs using fewer resources, 1
Massie, 2004, The ganglia distributed monitoring system: design, implementation, and experience, Parallel Comput., 30, 817, 10.1016/j.parco.2004.04.001
Aguilera, 2003, Performance debugging for distributed systems of black boxes, Oper. Syst. Rev., 37, 74, 10.1145/1165389.945454
H. Zhou, Y. Li, H. Yang, J. Jia, W. Li, Bigroots: An effective approach for root-cause analysis of stragglers in big data system, arXiv preprint arXiv:1801.03314.
Shi, 2015, Clash of the titans: Mapreduce vs. spark for large scale data analytics, Proc. VLDB Endow., 8, 2110, 10.14778/2831360.2831365
Huang, 2010, The hibench benchmark suite: Characterization of the mapreduce-based data analysis, 41