Adjusting ECN marking threshold in multi-queue DCNs with deep learning
Tóm tắt
Explicit Congestion Notification (ECN) is designed for single queues. However, today, data center networks (DCNs) need multiple queues on each switch port. But, if some of the switches in multiple queue scenarios exceed the ECN marking threshold, all packets on the same port can receive the ECN mark. To solve this problem, we propose mapping-ECN as a systematic answer to the wrong marking problem. First, we differentiate the mice and elephant flows learning algorithm. Then, we prioritize mice flows by keeping in mind the deadline of other flows to not sacrifice them. Secondly, if a packet is marked, we need to have the privilege of using a faster path than other packets for early notification of network status. This will give a complete picture of the instant requests from all senders. In the worst case, if there is no capacity in the buffer to transmit the packets that exceed the threshold of the buffer, mapping-ECN uses Cut Payload (CP), where CP drops the payloads of the packets when a queue reaches the threshold, rather than the metadata. Consequently, just one bit will transmit that carries the information of the packet. Therefore, the sender will immediately retransmit that packet without waiting for a time-out like TCP. This retransmission can arrive within a millisecond for having an extremely low latency network. Last but not least, mapping-ECN explores different kinds of neural network techniques to avoid miss marking in the output port buffer. Therefore, if any packet is marked within the queue buffer, these marked packets are not considered again for marking choices within the output port buffer. Mapping-ECN improves the overall performance of Flow-Completion Time (FCT) for short flows around 7%, 99th percentile around 52%, and FCT for short flows around 8% in comparison between MQ-ECN. Moreover, when compared to the MQ-ECN, Mapping-ECN improves the FCT for large flows, for cache flows and for mice (web search) flows 4, 15 and 6%, respectively. This improvement is legible in comparison between DemePro and Priority-ECN as well.
Tài liệu tham khảo
Ramakrishnan K, Floyd S (1998) A proposal to add explicit congestion notification (ECN) to IP. Tech Rep, pp 751–755. https://doi.org/10.17487/RFC2481
Alizadeh M, Greenberg A, Maltz DA, Padhye J, Patel P, Prabhakar B, Sengupta S, Sridharan M (2010) Datacenter TCP (DCTCP). Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM) 40(4):63–74
Bai W, Chen L, Chen K, Wu H (2016). Enabling ECN in multi-service multi-queue datacenters. In: NSDI'16: Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation (NSDI), pp 537–549
Akbar M, Gao X, Zhu S, Jahanbakhsh N, Zheng J, Chen G (2020) MiFi: bounded update to optimize network performance in software-defined data centers. IEEE/ACM Transactions Netw (ToN), pp 1–14. https://doi.org/10.1109/TNET.2022.3192167
Handley M, Raiciu C, Agache A, Voinescu A, Moore AW, Antichi G, Wo'jcik M (2017) Re-architecting datacenter networks and stacks for low latency and high performance. In: SIGCOMM '17: Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM), pp 29–42. https://doi.org/10.1145/3098822.3098825.
Luo J, Jin J, Shan F (2015) Standardization of low-latency TCP with explicit congestion notification: a survey. IEEE Internet Comput 21(1):48–55. https://doi.org/10.1109/MIC.2017.11
Fred Baker, Gorry Fairhurst (2015). IETF recommendations regarding active queue management. Internet Eng Task Force (IETF), Technical report
Kuhn N,Natarajan P, Khademi N, Ros D (2016) Characterization guide queues for active queue management (aqm). Internet Eng Task Force (IETF), Technical report
Bagnulo M, Briscoe B (2017) ECN++: adding explicit congestion notification (ECN) to TCP control packets. Draft-bagnulo-tcpm-generalized-ecn-04 (2017, work in progress). Internet Eng Task Force (IETF)
Kuehlewind M, Scheffenegger R, Briscoe B (2015) Problem statement and requirements for increased accuracy in explicit congestion notification (ECN) feedback Internet Engineering Task Force (IETF). RFC 7560
Gao C, Lee VCS (2016) DEME: Decouple packet marking from enqueuing for multiple services in datacenter networks. In: In International Conference on Network Protocols (ICNP), pp 1–2. IEEE.
Gao C, Lee VCS, Li K (2017) DemePro: decouple packet marking from enqueuing for multiple services with proactive congestion control. IEEE Trans Cloud Comput (TCM), pp 1–1. https://doi.org/10.1109/TCC.2017.2688318.
Floyd S, Jacobson V (1993) Random early detection gateways for congestion avoidance. IEEE/ACM Trans Netw (TON), 1(4):397–413
Majidi A, Jahanbakhsh N, Gao X, Zheng J, Chen G (2020) ECN+: A marking-aware optimization for ECN threshold via per-port in data center networks. J Netw Comput Appl (JNCA), 152(C). https://doi.org/10.1016/j.jnca.2019.102504, 152:102504–102517.
A, Jahanbakhsh N, Gao X, Zheng J, Chen G (2020) DC-ECN: a machine-learning based dynamic threshold control scheme for ECN marking in DCN. Comput Commun 150(C):334–345. https://doi.org/10.1016/j.comcom.2019.10.028Majidi.
Majidi A, Gao X, Jahanbakhsh S, Jamali S, Zheng J, Chen G (2019) Deep-RL: deep reinforcement learning for marking-aware via per-port in data centers. In: 2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS), pp 392–395. https://doi.org/10.1109/ICPADS47876.2019.00061.
Majidi A, Gao X, Jahanbakhsh N, Zheng J, Chen G (2020) Priority policy in multi-queue datacenter networks via per-port ECN marking. In: 2020 14th International Conference on Ubiquitous Information Management and Communication (IMCOM), pp 1–8, IEEE. https://doi.org/10.1109/IMCOM48794.2020.9001721.
Shan D, Ren F (2017) Improving ECN marking scheme with micro-burst traffic in data center networks. In: International Conference on Computer Communications (INFOCOM). IEEE, 2017, pp 1–9. https://doi.org/10.1109/INFOCOM.2017.8057181.
Alizadeh M, Kabbani A, Atikoglu B, Prabhakar B (2011) Stability analysis of QCN: the averaging principle. ACM SIGMETRICS Perform Eval Rev 39(1):49–60. https://doi.org/10.1145/2007116.2007123
Shan D, Ren F, Cheng P, Shu R, Guo C (2018) Micro-burst in data centers: observations, analysis, and mitigations. In: 2018 IEEE 26th International Conference on Network Protocols (ICNP), 2018, pp 88–98. https://doi.org/10.1109/ICNP.2018.00019.
Wu H, Ju J, Lu G, Guo C, Xiong Y, Zhang Y (2012) Tuning ECN for data center networks. In: CoNEXT '12: Proceedings of the 8th International Conference on Emerging Networking Experiments and Technologies, pp 25–36. https://doi.org/10.1145/2413176.2413181.
Chen L, Chen K, Bai W, Alizadeh M (2016) Scheduling mix-flows in commodity datacenters with karuna. In: Proceedings of the 2016 ACM SIGCOMM Conference, pp 174–187. ACM. https://doi.org/10.1145/2934872.2934888.
Lu Y, Chen G, Luo L, Tan K, Xiong Y, Wang X, Chen E (2017) One more queue is enough: minimizing flow completion time with explicit priority notification. In: IEEE INFOCOM 2017—IEEE Conference on Computer Communications, 2017, pp 1–9. https://doi.org/10.1109/INFOCOM.2017.8056946.
Cheng P, Ren F, Shu R, Lin C (2014) Catch the whole lot in an action: rapid precise packet loss notification in data center. In: NSDI'14: Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI), pp 17–28
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv preprint. https://doi.org/10.48550/arXiv.1312.5602.
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529:484–489. https://doi.org/10.1038/nature16961
Poupart P, Chen Z, Jaini P, Fung F, Susanto H, Geng Y, Chen L, Chen K, Jin H (2016) Online flow size prediction for improved network routing. In: 2016 IEEE 24th International Conference on Network Protocols (ICNP), 2016, pp 1–6. https://doi.org/10.1109/ICNP.2016.7785324.
Williams CK, Rasmussen CE (2006) Gaussian processes for machine learning. The MIT Press, 2(3):4. https://doi.org/10.7551/mitpress/3206.001.0001.
Sutton RS, Barto AG (1998) Introduction to reinforcement learning. MIT Press, Cambridge, A Bradford Book, p 322
Omar F (2016) Online Bayesian learning in probabilistic graphical models using moment matching with applications. The University of Waterloo's publication. https://doi.org/10.13140/RG.2.2.22951.04003
Sutton RS, McAllester DA, Singh SP, Mansour Y (1999) Policy gradient methods for reinforcement learning with function approximation. Adv Neural Inf Process Syst
Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms. In: ICML'14: Proceedings of the 31st International Conference on International Conference on Machine Learning, vol. 32, pp 387–395
Katta NP, Rexford J, Walker D (2013) Incremental consistent updates. In: HotSDN '13: Proceedings of the second ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking (SIGCOMM), pp 49–54. https://doi.org/10.1145/2491185.2491191.
Mnih V, KavukcuogluK, Silver D, Graves A, Antonoglou L, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv preprint: arXiv:1312.5602.
Chen L, Lingys J, Chen K, Liu F (2015) Auto: scaling deep reinforcement learning for datacenter-scale automatic traffic optimization. In: SIGCOMM '18: Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication (SIGCOMM), pp 191–205
Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra (2015). Continuous control with deep reinforcement learning. arXiv preprint: arXiv:1509.02971.
Pan Y, Tian C, Zheng J, Zhang G, Susanto H, Bai B, Chen G (2018) Support ECN in multi-queue datacenter networks via per-port marking with selective blindness. In: International Conference on Distributed Computing Systems (ICDCS). IEEE, pp 33–42. https://doi.org/10.1109/ICDCS.2018.00014.
Alizadeh M, Yang S, Sharif M, Katti S, McKeown N, Prabhakar B, Shenker S (2013) pFabric: Minimal near-optimal datacenter transport. ACM SIGCOMM Comput Commun Rev 43(4):435–446
Van Kessel G, Nunez-Queija R, Borst S (2005) Differenttiated bandwidth sharing with disparate flow sizes. In: Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies., 2005, vol 4, pp 2425–2435. https://doi.org/10.1109/INFCOM.2005.1498528.
Hu C, Liu B, Zhao H, Chen K, Yu YC, Cheng HW (2014) Discount counting for fast flow statistics on flow size and flow volume. IEEE/ACM Trans Netw 22(3):970–981. https://doi.org/10.1109/TNET.2013.2270439
Rai IA, Biersack EW, Urvoy-Kelle G (2005) Size-based scheduling to improve the performance of short TCP flows. EEE Network 19(1):12–17. https://doi.org/10.1109/MNET.2005.1383435.
Bai W, Chen L, Chen K, Han D, Tian C, Wang H (2017) PIAS: practical information-agnostic flow scheduling for commodity data centers. In: IEEE/ACM Transa Netw 25(4):1954–1967. https://doi.org/10.1109/TNET.2017.2669216.
Kumar A, Xu J (2006) Sketch guided sampling—using on—queue estimates of flow size for adaptive data collection. In: Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications, 2006, pp 1–11. https://doi.org/10.1109/INFOCOM.2006.326.
Lall A, Ogihara M, Jun Xu (2009) An efficient algorithm for measuring medium–to–large–sized flows in network traffic. IEEE INFOCOM 2009:2711–2715. https://doi.org/10.1109/INFCOM.2009.5062217
Hu C, Liu B, Wang S, Tian J, Cheng Y, Chen Y (2012) ANLS: adaptive non–queuear sampling method for accurate flow size measurement. IEEE Trans Commun (ToC)60(3):789– 798. https://doi.org/10.1109/TCOMM.2011.112311.100622.
Zandi Y, Majidi A, Ma L (2019) DENA: an intelligent dynamic flow scheduling for rate adjustment in green DCNs. In: IEEE Conference on Local Computer Networks (LCN), pp 234–237. https://doi.org/10.1109/LCN44214.2019.8990731
Lee C, Park C, Jang K, Moon S, Han D (2017) Dx: latency-based congestion control for datacenters. IEEE/ACM Trans Netw (TON) 25(1):335–348. https://doi.org/10.1109/TNET.2016.2587286
