Dynamic monitoring of high-performance distributed applications

D. Gunter1, B. Tierney1, K. Jackson1, J. Lee1, M. Stoufer1
1Computing Sciences Directorate Lawrence Berkeley National Laboratory, University of California, Berkeley, CA, USA

Tóm tắt

Developers and users of high-performance distributed systems often observe performance problems such as unexpectedly low throughput or high latency. Determining the source of the performance problems requires detailed end-to-end instrumentation of all components, including the applications, operating systems, hosts, and networks. However, one must be very careful to design the instrumentation to have extremely low overhead, and not affect the system being monitored. In this paper we present a very light-weight instrumentation system that can be dynamically activated to unobtrusively collect and aggregate detailed end-to-end monitoring information from distributed applications. We also show how emerging "web services" can be used to facilitate remote interaction with this system.

Từ khóa

#Distributed computing #Instruments #Pipelines #XML #Condition monitoring #Computer buffers #Grid computing #High performance computing #Laboratories #Libraries

Tài liệu tham khảo

0, XEVENTS project web page vazhkudai, 2001, Replica selection in the Globus Data Grid International Workshop on Data Models and Databases on Clusters and the Grid (DataGrid 2001) tierney, 2000, Using NetLogger for Distributed Systems Performance Analysis of the BaBar Data Analysis System, Proceedings of Computers in High Energy Physics 2000 (CHEP 2000) tierney, 0, A Grid Monitoring Service Architecture, Global Grid Forum White Paper 10.1109/HPDC.2001.945200 2000, Simple Object Access Protocol (SOAP) 1 1 W3C Note 10.1016/S0167-739X(99)00025-4 0, Universal Description Discovery and Integration (UDDI) tuecke, 2002, Internet X.509 Public Key Infrastructure Proxy Certificate Profile, Internet Draft draft-ietf-pkix-proxy-02 txt 10.1109/HPDC.1998.709980 eisenhauer, 2001, Event Services in High Performance Systems, Cluster Computing The Journal of Networks Software Tools and Applications, 4, 243 0, European Data Grid project fisher, 0, Relational Grid Monitoring Architecture Package foster, 1999, Globus: A Toolkit-Based Grid Architecture, The Grid Blueprint for a New Computing Infrastructure, 259 10.1145/288090.288111 foster, 2002, The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration graham, 2001, Building Web Services with Java: Making Sense of XML, SOAP, WSDL, and UDDI, SAMS 0, Globus IO 0, Global Grid Forum (GGF) 0, The GriPhyN Project 1987 cancio, 0, The DataGrid architecture smith, 0, A Framework for Control and Observation in Distributed Environments, NAS Technical Report Number NAS-01–006 10.1109/SC.2000.10002 0, CORBA. Systems Management: Event Management Service, X/Open Document Number P437 slominski, 0, An Extensible and Interoperable Event System Architecture Using SOAP christensen, 2001, Web Service Description Language (WSDL), 1 1 W3C Note dierks, 2002, The TLS Protocol Version 1.0, Internet Draft draft-ietf-tls-rfc2246-bis-01 txt 10.1109/HPDC.2001.945188 allcock, 2001, Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing, IEEE Mass Storage Conference 10.1109/HCW.2000.843735 abela, 0, Universal Format for Logger Messages, IETF Internet Draft 0, Jini Distributed Event Specification 0, log4j performance results 0, Log4j 10.17487/rfc1769 1994, Message Passing Interface Forum MPI A Message-Passing Interface Standard 10.1109/HPDC.1998.709970 0, Particle Physics Data Grid (PPDG)