Versionized process based on non-volatile random-access memory for fine-grained fault tolerance

Zhejiang University Press - Tập 19 - Trang 192-205 - 2018
Wen-zhe Zhang1, Kai Lu1, Xiao-ping Wang1
1Science and Technology on Parallel and Distributed Processing Laboratory, College of Computer, National University of Defense Technology, Changsha, China

Tóm tắt

Non-volatile random-access memory (NVRAM) technology is maturing rapidly and its byte-persistence feature allows the design of new and efficient fault tolerance mechanisms. In this paper we propose the versionized process (VerP), a new process model based on NVRAM that is natively non-volatile and fault tolerant. We introduce an intermediate software layer that allows us to run a process directly on NVRAM and to put all the process states into NVRAM, and then propose a mechanism to versionize all the process data. Each piece of the process data is given a special version number, which increases with the modification of that piece of data. The version number can effectively help us trace the modification of any data and recover it to a consistent state after a system crash. Compared with traditional checkpoint methods, our work can achieve fine-grained fault tolerance at very little cost.

Tài liệu tham khảo

Adiga NR, Almasi G, Bright AA, et al., 2002. An overview of the Bluegene/L supercomputer. Proc ACM/IEEE Conf on Supercomputing, p.60. https://doi.org/10.1109/SC.2002.10017 Badam A, 2013. How persistent memory will change software systems. Computer, 46(8):45–51. https://doi.org/10.1109/MC.2013.189 Bailey K, Ceze L, Gribble SD, et al., 2011. Operating system implications of fast, cheap, non-volatile memory. Proc 13th Usenix Conf on Hot Topics in Operating Systems, p.2. Coburn J, Caulfield AM, Akel A, et al., 2011. NV-Heaps: making persistent objects fast and safe with nextgeneration, non-volatile memories. ACM SIGARCH Comput Archit News, 39(1):105–118. https://doi.org/10.1145/1950365.1950380 D’Amorim M, Rosu G, 2005. An equational specification for the scheme language. J Univ Comput, 11(7):1327–1348. https://doi.org/10.3217/jucs-011-07-1327 Dong X, Xie Y, Muralimanohar N, et al., 2011. Hybrid checkpointing using emerging nonvolatile memories for future exascale system. ACM Trans Archit Code Optim, 8(2), Article 6. https://doi.org/10.1145/1970386.1970387 Dulloor SR, Kumar S, Keshavamurthy A, et al., 2014. System software for persistent memory. Proc 9th European Conf on Computer Systems, p.15. https://doi.org/10.1145/2592798.2592814 Guerraoui R, Trigonakis V, 2016. Optimistic concurrency with OPTIK. ACM SIGPLAN Symp on Principles and Practice of Parallel Programming, p.197–211. https://doi.org/10.1145/2851141.2851146 Kannan S, Gavrilovska A, Schwan K, et al., 2013. Optimizing checkpoints using NVM as virtual memory. IEEE 27th Int Symp on Parallel & Distributed Processing, p.29–40. Larkin J, Fahey M, 2007. Guidelines for efficient parallel I/O on the cray XT3/XT4. Proc Cray User Group. Liang S, Bracha G, 2000. Dynamic class loading in the Java virtual machine. ACM SIGPLAN Not, 33(10):36–44. https://doi.org/10.1145/286942.286945 Liang Y, Zhang Y, Sivasubramaniam A, et al., 2006. Bluegene/ L failure analysis and prediction models. Int Conf on Dependable Systems and Networks, p.425–434. https://doi.org/10.1109/DSN.2006.18 Liang Y, Zhang Y, Xiong H, et al., 2007. Failure prediction in IBM Bluegene/L event logs. 7th IEEE Int Conf on Data Mining, p.583–588. https://doi.org/10.1109/ICDM.2007.46 Lu X, Wang H, Wang J, et al., 2013. Internet-based virtual computing environment: beyond the data center as a computer. Fut Gener Comput Syst, 29(1):309–322. https://doi.org/10.1016/j.future.2011.08.005 Luk CK, Cohn R, Muth R, et al., 2005. Pin: building customized program analysis tools with dynamic instrumentation. ACM SIGPLAN Conf on Programming Language Design and Implementation, p.190–200. https://doi.org/10.1145/1064978.1065034 Oliphant TE, 2007. Python for scientific computing. Comput Sci Eng, 9(3):10–20. https://doi.org/10.1109/MCSE.2007.58 Qureshi MK, Franceschini MM, Jagmohan A, et al., 2012. PreSET: improving performance of phase change memories by exploiting asymmetry in write times. 39th Annual Int Symp on Computer Architecture, p.380–391. Rhodes C, Costanza P, D’Hondt T, et al., 2007. Lisp. Conf on Object-Oriented Technology, p.1–6. Surhone LM, Timpledon M, Marseken SF, et al., 2010. TinyScheme. Betascript Publishing. Uhlig R, Neiger G, Rodger D, et al., 2005. Intel virtualization technology. Computer, 38(5):48–56. Vallée-Rai R, Gagnon E, Hendren L, et al., 2000. Optimizing Java bytecode using the soot framework: is it feasible? Int Conf on Compiler Construction, p.18–34. Venkataraman S, Tolia N, Ranganathan P, et al., 2011. Consistent and durable data structures for non-volatile byteaddressable memory. Usenix Conf on File and Stroage Technologies, p.61–75. https://doi.org/10.1145/2189750.2151018 Volos H, Tack AJ, Swift MM, 2011. Mnemosyne: lightweight persistent memory. ACM SIGARCH Comput Archit News, 39(1):91–104. https://doi.org/10.1145/1961296.1950379 Volos H, Nalli S, Panneerselvam S, et al., 2014. Aerie: flexible file-system interfaces to storage-class memory. Proc 9th European Conf on Computer Systems, p.1–14. Wong HSP, Raoux S, Kim SB, et al., 2010. Phase change memory. Proc IEEE, 98(12):2201–2227. https://doi.org/10.1109/JPROC.2010.2070050 Yang X, Wang Z, Xue J, et al., 2012. The reliability wall for exascale supercomputing. IEEE Trans Comput, 61(6):767–779. https://doi.org/10.1109/TC.2011.106 Zhang WZ, Kai L, Luján M, et al., 2017. Fine-grained checkpoint based on non-volatile memory. Front Inform Technol Electron Eng, 18(2):220–234. https://doi.org/10.1631/FITEE.1500352 Zhou P, Zhao B, Yang J, et al., 2009. A durable and energy efficient main memory using phase change memory technology. ACM SIGARCH Comput Archit News, 37(3):14–23. https://doi.org/10.1145/1555754.1555759