Predictable quality of service atop degradable distributed systems

Springer Science and Business Media LLC - Tập 16 Số 2 - Trang 321-334 - 2013
Lavanya Ramakrishnan1, Daniel A. Reed2
1Indiana University, Bloomington, IN, USA
2Microsoft Research, Redmond, WA, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

Alonso, G., Hagen, C., Agrawal, D., Abbadi, A.E., Mohan, C.: Enhancing the fault tolerance of workflow management systems. In: IEEE Concurrency, 2000

Availability prediction service. http://nws.cs.ucsb.edu/ewiki/nws.php?id=Availability+Prediction+Service%

Blythe, J., Jain, S., Deelman, E., Gil, Y., Vahi, K., Mandal, A., Kennedy, K.: Task scheduling strategies for workflow-based applications in grids. In: CCGRID, pp. 759–767 (2005)

Braun, T.D., Siegel, H.J., Beck, N.: A comparision of eleven static heuristics for maping a class of independent tasks onto heterogeneous distributed computing systems. J. Parallel Distrib. Comput. (2001)

da Lu, C., Reed, D.A.: Assessing fault sensitivity in MPI applications. In: Proc. of Supercomputing, 2004

Droegemeier, K.K., et al.: Service-oriented environments for dynamically interacting with mesoscale weather. Comput. Sci. Eng. (2005)

Haverkort, B.R., Marie, R., Rubino, G., Trivedi, K.: Performability Modelling. Wiley, New York (2001)

Hwang, S., Kesselman, C.: A flexible framework for fault tolerance in the grid. J. Grid Comput. (2003)

Inca real time monitoring suite. http://inca.sdsc.edu/

Kennedy, K., et al.: Toward a framework for preparing and executing adaptive grid programs. In: Proceedings of NSF Next Generation Systems Program Workshop (International Parallel and Distributed Processing Symposium), 2002

Khalili, O., He, J., Olschanowsky, C., Snavely, A., Casanova, H.: Measuring the performance and reliability of production computational grids. In: The 7th IEEE/ACM International Conference on Grid Computing, 2006

Kramer, W., Ryan, C.: Performance variability of highly parallel architectures. In: International Conference on Computational Science, 2003

Los almos reliability data. http://institutes.lanl.gov/data/fdata/

Malewicz, G.: Parallel scheduling of complex dags under uncertainty. In: Proceedings of the 17th Annual ACM Symposium on Parallel Algorithms (SPAA), pp. 66–75 (2005)

Meyer, J.F.: On evaluating the performability of degradable computing systems. IEEE Trans. Comput. (1980)

Nurmi, D., Brevik, J., Wolski, R.: Minimizing the network overhead of checkpointing in cycle harvesting cluster environments. Future Gener. Comput. Syst. (2006)

Ramakrishnan, L., Reed, D.A.: Performability modeling for scheduling and fault tolerance strategies for grid workflows. In: ACM/IEEE International Symposium on High Performance Distributed Computing, 2008

Reed, D.A., da Lu, C., Mendes, C.L.: Reliability challenges in large systems. Future Gener. Comput. Syst. (2006)

Sahner, R.A., Trivedi, K.S., Puliafito, A.: Performance and Reliability Analysis of Computer Systems: An Example-Based Approach Using the SHARPE Software Package. Kluwer Academic, Dordrecht (1996)

Sakellariou, R., Zhao, H., Tsiakkouri, E., Dikaiakos, M.: Scheduling workflows with budget constraints. In: Gorlatch, S., Danelutto, M. (eds.) Integrated Research in GRID Computing, CoreGRID, pp. 189–202. Springer, New York (2007)

Schopf, J., Berman, F.: Performance prediction in production environments. In: Proceedings of IPPS/SPDP, 1998

Schroeder, B., Gibson, G.: A large-scale study of failures in high-performance computing systems. In: Proc. of the International Conference on Dependable Systems, 2006

Weissman, J.B.: Fault tolerant computing on the grid: what are my options? In: HPDC, 1999

Yu, J., Buyya, R.: Scheduling scientific workflow applications with deadline and budget constraints using genetic algorithms. Sci. Program. 14(3–4), 217–230 (2006)

Zhang, Y., Mandal, A., Casanova, H., Chien, A., Kee, Y., Kennedy, K., Koelbel, C.: Scalable grid application scheduling via decoupled resource selection and scheduling. In: CCGrid, 2006