The complexity of adding failsafe fault-tolerance

S.S. Kulkarni1, A. Ebnenasir1
1Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA

Tóm tắt

In this paper, we focus our attention on the problem of automating the addition of failsafe fault-tolerance where fault-tolerance is added to an existing (fault-intolerant) program. A failsafe fault-tolerant program satisfies its specification (including safety and liveness) in the absence of faults. And, in the presence of faults, it satisfies its safety specification. We present a somewhat unexpected result that, in general, the problem of adding failsafe fault-tolerance in distributed programs is NP-hard. Towards this end, we reduce the 3-SAT problem to the problem of adding failsafe fault-tolerance. We also identify a class of specifications, monotonic specifications and a class of programs, monotonic programs. Given a (positive) monotonic specification and a (negative) monotonic program, we show that failsafe fault-tolerance can be added in polynomial time. We note that the monotonicity restrictions are met for commonly encountered problems such as Byzantine agreement, distributed consensus, and atomic commitment. Finally, we argue that the restrictions on the specifications and programs are necessary to add failsafe fault-tolerance in polynomial time; we prove that if only one of these conditions is satisfied, the addition of failsafe fault-tolerance is still NP-hard.

Từ khóa

#Fault tolerance #Safety #Polynomials #Algorithm design and analysis #Automation #Computer science #Fault tolerant systems #Engineering profession #Contracts #Fault diagnosis

Tài liệu tham khảo

kulkarni, 2002, The complexity of adding failsafe fault-tolerance. Technical Report MSU-CSE.02–10, Computer Science and Engineering 10.1145/277697.277729 kupferman, 1997, Synthesis with incomplete information, ICTL 10.1145/75277.75293 10.1007/3-540-58179-0_51 10.1145/152610.152612 10.1109/RELDIS.2001.969767 gong, 1995, Byzantine agreement with authentication: Observations and applications in tolerating hybrid and link faults, Dependable Computing and Fault Tolerant Systems IEEE Comnuter Society, 10, 139 10.1145/357172.357176 10.1109/32.256850 10.1016/0020-0190(85)90056-0 kulkarni, 2000, Automating the addition of fault-tolerance, Formal Techniques in Real-Time and Fault-Tolerant Systems, 10.1007/3-540-45352-0_9 kulkarni, 1999, Component-based design of fault-tolerance 10.1145/383043.383044