Architecture and dependability of large-scale internet services

IEEE Internet Computing - Tập 6 Số 5 - Trang 41-49 - 2002
D. Oppenheimer1, D.A. Patterson1
1University of California, Berkeley, USA

Tóm tắt

The popularity of large-scale Internet infrastructure services such as AOL, Google, and Hotmail has grown enormously. The scalability and availability requirements of these services have led to system architectures that diverge significantly from those of traditional systems like desktops, enterprise servers, or databases. Given the need for thousands of nodes, cost necessitates the use of inexpensive personal computers wherever possible, and efficiency often requires customized service software. Likewise, addressing the goal of zero downtime requires human operator involvement and pervasive redundancy within clusters and between globally distributed data centers. Despite these services' success, their architectures-hardware, software, and operational-have developed in an ad hoc manner that few have surveyed or analyzed. Moreover, the public knows little about why these services fail or about the operational practices used in an attempt to keep them running 24/7. As a first step toward formalizing the principles for building highly available and maintainable large-scale Internet services, we are surveying existing services' architectures and dependability. This article describes our observations to date.

Từ khóa

#Large-scale systems #Web and internet services #Scalability #Availability #Computer architecture #Web server #Databases #Costs #Microcomputers #Humans

Tài liệu tham khảo

10.1109/2.585151 0, building scalable services 10.1002/qre.4680110505 gray, 1986, why do computers stop and what can be done about it?, Proc Symp Reliability in Distributed Software and Database Systems, 3 xu, 1999, networked windows nt system field failure data analysis, Proc Pacific Rim Int l Symp Dependable Computing brewer, 2001, lessons from giant-scale services, ieee internet computing, 4, 46 oppenheimer, 2002, why do internet services fail, and what can be done about it?