QoS-driven scheduling in the cloud

Giovanni Farias da Silva1, Francisco Brasileiro1, Raquel Lopes1, FÁBIO HENRIQUE DE SIQUEIRA MORAIS2, Marcus Carvalho2, Daniel Turull3
1Federal University of Campina Grande, Department of Computing and Systems, Av. Aprígio Veloso, 882 – Bloco CO, Campina Grande – PB, 58.429-900, Brazil
2Federal University of Paraíba, Department of Exact Sciences, Av. Santa Elisabete, 160, Rio Tinto – PB, 58.297-000, Brazil
3Ericsson Research, Torshamnsgatan 21, Stockholm, 164 83, Sweden

Tóm tắt

Abstract Priority-based scheduling policies are commonly used to guarantee that requests submitted to the different service classes offered by cloud providers achieve the desired Quality of Service (QoS). However, the QoS delivered during resource contention periods may be unfair on certain requests. In particular, lower priority requests may have their resources preempted to accommodate resources associated with higher priority ones, even if the actual QoS delivered to the latter is above the desired level, while the former is underserved. Also, competing requests with the same priority may experience quite different QoS, since some of them may have their resources preempted, while others do not. In this paper we present a new scheduling policy that is driven by the QoS promised to individual requests. Benefits of using the QoS-driven policy are twofold: it maintains the QoS of each request as high as possible, considering their QoS targets and available resources; and it minimizes the variance of the QoS delivered to requests of the same class, promoting fairness. We used simulation experiments fed with traces from a production system to compare the QoS-driven policy with a state-of-the-practice priority-based one. In general, the QoS-driven policy delivers a better service than the priority-based one. Moreover, the equity of the QoS delivered to requests of the same class is much higher when the QoS-driven policy is used, particularly when not all requests get the promised QoS, which is the most important scenario. Finally, based on the current practice of large public cloud providers, our results show that penalties incurred by the priority-based scheduler in the scenarios studied can be, on average, as much as 193% higher than those incurred by the QoS-driven one.

Từ khóa


Tài liệu tham khảo

Marshall P, Keahey K, Freeman T. Improving utilization of infrastructure clouds. In: Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing CCGRID ’11. Washington: IEEE Computer Society: 2011. p. 205–14.

Amazon EC2 - Instances pricing. 2019. https://aws.amazon.com/ec2/pricing/. Accessed 28 Nov 2019.

Google Compute Engine - Preemptible Instances. 2019. https://cloud.google.com/compute/docs/instances/preemptible. Accessed 15 Dec 2019.

Carvalho M, Cirne W, Brasileiro F, Wilkes J. Long-term slos for reclaimed cloud computing resources. In: Proceedings of the ACM Symposium on Cloud Computing SOCC ’14. New York: ACM: 2014. p. 1–13.

Carvalho M, Menasce D, Brasileiro F. Prediction-based admission control for iaas clouds with multiple service classes. In: Proceedings of the 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom) CLOUDCOM ’15. Washington: IEEE Computer Society: 2015. p. 82–90.

Xu J, Zhu C. Optimal pricing and capacity planning of a new economy cloud computing service class. In: 2015 International Conference on Cloud and Autonomic Computing. Washington: IEEE Computer Society: 2015. p. 149–57.

Cirne W, Frachtenberg E. Web-scale job scheduling. Lecture Notes in Computer Science. 2013; 7698:1–15.

Vavilapalli VK, Murthy AC, Douglas C, Agarwal S, Konar M, Evans R, Graves T, Lowe J, Shah H, Seth S, Saha B, Curino C, O’Malley O, Radia S, Reed B, Baldeschwieler E. Apache hadoop yarn: Yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing SOCC ’13. New York: ACM: 2013. p. 1–16.

Verma A, Pedrosa L, Korupolu M, Oppenheimer D, Tune E, Wilkes J. Large-scale cluster management at google with borg. In: Proceedings of the Tenth European Conference on Computer Systems EuroSys ’15. New York: ACM: 2015. p. 1–17.

Schwarzkopf M, Konwinski A, Abd-El-Malek M, Wilkes J. Omega: Flexible, scalable schedulers for large compute clusters. In: Proceedings of the 8th ACM European Conference on Computer Systems EuroSys ’13. New York: ACM: 2013. p. 351–64.

Karanasos K, Rao S, Curino C, Douglas C, Chaliparambil K, Fumarola GM, Heddaya S, Ramakrishnan R, Sakalanaga S. Mercury: Hybrid centralized and distributed scheduling in large shared clusters. In: Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference USENIX ATC ’15. Berkeley: USENIX Association: 2015. p. 485–97.

Boutin E, Ekanayake J, Lin W, Shi B, Zhou J, Qian Z, Wu M, Zhou L. Apollo: Scalable and coordinated scheduling for cloud-scale computing. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation OSDI’14. Berkeley: USENIX Association: 2014. p. 285–300.

Delimitrou C, Sanchez D, Kozyrakis C. Tarcil: Reconciling scheduling speed and quality in large shared clusters. In: Proceedings of the Sixth ACM Symposium on Cloud Computing SoCC ’15. New York: ACM: 2015. p. 97–110.

Burns B, Grant B, Oppenheimer D, Brewer E, Wilkes J. Borg, omega, and kubernetes. Commun ACM. 2016; 59(5):50–7.

Wilkes J. More Google cluster data. Google research blog. 2011. https://ai.googleblog.com/2011/11/more-google-cluster-data.html.

Reiss C, Tumanov A, Ganger GR, Katz RH, Kozuch MA. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In: Proceedings of the Third ACM Symposium on Cloud Computing SoCC ’12. New York: ACM: 2012. p. 1–13.

Curino C, Difallah DE, Douglas C, Krishnan S, Ramakrishnan R, Rao S. Reservation-based scheduling: If you’re late don’t blame us!. In: Proceedings of the ACM Symposium on Cloud Computing SOCC ’14. New York: ACM: 2014. p. 1–14.

Dubey S, Agrawal S. Qos driven task scheduling in cloud computing. Int. J. Comput. Appl. Technol. Res. 2013; 2(5):595–600.

Wu X, Deng M, Zhang R, Zeng B, Zhou S. A task scheduling algorithm based on qos-driven in cloud computing. Procedia Computer Sci. 2013; 17:1162–9.

Delimitrou C, Kozyrakis C. Quasar: resource-efficient and qos-aware cluster management. ACM SIGPLAN Notices. 2014; 49(4):127–44.

Goiri I, Julia F, Nou R, Berral JL, Guitart J, Torres J. Energy-aware scheduling in virtualized datacenters. In: Proceedings of the 2010 IEEE International Conference on Cluster Computing CLUSTER ’10. Washington: IEEE Computer Society: 2010. p. 58–67.

Kong X, Lin C, Jiang Y, Yan W, Chu X. Efficient dynamic task scheduling in virtualized data centers with fuzzy prediction. J Netw Comput Appl. 2011; 34(4):1068–77.

Delimitrou C, Kozyrakis C. Paragon: Qos-aware scheduling for heterogeneous datacenters. SIGPLAN Not. 2013; 48(4):77–88.

Ousterhout K, Wendell P, Zaharia M, Stoica I. Sparrow: Distributed, low latency scheduling. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles SOSP ’13. New York: ACM: 2013. p. 69–84.

Shahrad M, Wentzlaff D. Availability knob: Flexible user-defined availability in the cloud. In: Proceedings of the Seventh ACM Symposium on Cloud Computing SoCC ’16. New York: ACM: 2016. p. 42–56.

He X, Sun X, Von Laszewski G. Qos guided min-min heuristic for grid task scheduling. J Comput Sci Technol. 2003; 18(4):442–51.

Silva G, Lopes R, Brasileiro F, Carvalho M, Morais F, Mafra J, Turull D. Fair scheduling in cloud infrastructures with multiple service classes (in Portuguese). In: Proceedings of the 37th Brazilian Symposium on Computer Networks and Distributed Systems. Porto Alegre: SBC: 2019. p. 636–49. https://sol.sbc.org.br/index.php/sbrc/article/view/7392.

Pan W, Rowe J, Barlaoura G. Records in the cloud (ric) user survey report. Tech Rep Univ British Columbia. 2013. doi:10.14288/1.0075820.

Liu CL, Layland JW. Scheduling algorithms for multiprogramming in a hard-real-time environment. Journal of the ACM (JACM). 1973; 20(1):46–61.

Lopes RV, Menascé D. A taxonomy of job scheduling on distributed computing systems. IEEE Trans Parallel Distrib Syst. 2016; 27(12):3412–28.

Reiss C, Wilkes J, Hellerstein JL. Google cluster-usage traces: format + schema. Technical report, Google Inc. 2014.

Carvalho M, Menascé DA, Brasileiro F. Capacity planning for iaas cloud providers offering multiple service classes. Futur Gener Comput Syst. 2017; 77:97–111.

Tirmazi M, Barker A, Deng N, Haque ME, Qin ZG, Hand S, Harchol-Balter M, Wilkes J. Borg: the next generation. In: EuroSys’20. Heraklion: 2020. p. 1–14.

Bellu LG, Liberati P. Inequality Analysis: The Gini Index. Food Agric Organ U N FAO. 2006; 40:6–9.