Workflow-Based Data Parallel Applications on the EGEE Production Grid Infrastructure
Tóm tắt
Setting up and deploying complex applications on a Grid infrastructure is still challenging and the programming models are rapidly evolving. Efficiently exploiting Grid parallelism is often not straight forward. In this paper, we report on the techniques used for deploying applications on the EGEE production Grid through four experiments coming from completely different scientific areas: nuclear fusion, astrophysics and medical imaging. These applications have in common the need for manipulating huge amounts of data and all are computationally intensive. All the cases studied show that the deployment of data intensive applications require the development of more or less elaborated application-level workload management systems on top of the gLite middleware to efficiently exploit the EGEE Grid resources. In particular, the adoption of high level workflow management systems eases the integration of large scale applications while exploiting Grid parallelism transparently. Different approaches for scientific workflow management are discussed. The MOTEUR workflow manager strategy to efficiently deal with complex data flows is more particularly detailed. Without requiring specific application development, it leads to very significant speed-ups.
Tài liệu tham khảo
Arnold, D., Agrawal, S., Blackford, S., Dongarra, J., Miller, M., Seymour, K., Sagi, K., Shi, Z., Vadhiyar, S.: Users’ guide to NetSolve V1.4.1. Technical Report ICL-UT-02-05, University of Tennessee, Knoxville (2002)
Ascasíbar, E., et al.: Confinement and stability on the TJ-II Stellarator. Plasma Phys. Control. Fusion 44, B307 (2002)
Bond, R., Crittenden, R., Jaffe, A., Knox, L.: Computing challenges of the cosmic microwave background. Comput. Sci. Eng. 1(1), 21–29 (1999)
Capit, N., Da Costa, G., Georgiou, Y., Huard, G., Marti, C.: A batch scheduler with high level components. In: Cluster Computing and Grid 2005 (CCGrid’05), vol. 2, pp. 776–783. Institute of Electrical & Electronics Engineers, New York (2005)
Caron, E., Desprez, F.: DIET: a scalable toolbox to build network enabled servers on the Grid. Int. J. High Perform. Comput. Appl. 20, 335–352 (2005)
Castejón, F., et al.: Ion orbits and ion confinement studies on ECRH plasmas in TJ-II stellarator. Fusion Sci. Technol. 50, 412–418 (2006)
Deelman, E., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Blackburn, K., Lazzarini, A., Arbree, A., Cavanaugh, R., Koranda, S.: Mapping abstract complex workflows onto Grid environments. J. Grid Comput. 1(1), 9–23 (2003)
Glatard, T., Emsellem, D., Montagnat, J.: Generic web service wrapper for efficient embedding of legacy codes in service-based workflows. In: Grid-Enabling Legacy Applications and Supporting End Users Workshop (GELA’06), Paris, 19–23 June 2006
Glatard, T., Montagnat, J., Lingrand, D., Pennec, X.: Flexible and efficient workflow deployement of data-intensive applications on Grids with MOTEUR. Int. J. High Perform. Comput. Appl. 22(3), 347–360 (2008)
Glatard, T., Montagnat, J., Pennec, X.: Efficient services composition for Grid-enabled data-intensive applications. In: IEEE International Symposium on High Performance Distributed Computing (HPDC’06), Paris, France (2006)
Glatard, T., Montagnat, J., Pennec, X.: Medical image registration algorithms assesment: bronze standard application enactment on Grids using the MOTEUR workflow engine. In: HealthGrid Conference (HealthGrid’06), Valencia, Spain (2006)
Glatard, T., Montagnat, J., Pennec, X.: Probabilistic and dynamic optimization of job partitioning on a Grid infrastructure. In: 14th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP06), Montbéliard-Sochaux, France, pp. 231–238 (2006d)
Glatard, T., Montagnat, J., Pennec, X.: Optimizing jobs timeouts on clusters and production Grids. In: International Symposium on Cluster Computing and the Grid (CCGrid), Rio de Janeiro (2007)
Glatard, T., Pennec, X., Montagnat, J.: Performance evaluation of Grid-enabled registration algorithms using bronze-standards. In: Medical Image Computing and Computer-Assisted Intervention (MICCAI’06), Copenhagen, Denmark (2006e)
Glatard, T., Sipos, G., Montagnat, J., Farkas, Z., Kacsuk, P.: Workflow Level Parametric Study Support by MOTEUR and the P-GRADE Portal, Chapt. 18. Springer, Berlin (2007)
Gorski, K.M., et al.: Analysis issues for large CMB data sets. In: Evolution of Large Scale Structure: From Recombination to Garching, p. 37. ESO, Garching (1998)
Jannin, P., Fitzpatrick, J., Hawkes, D., Pennec, X., Shahidi, R., Vannier, M.: Validation of Medical Image Processing in Image-guided Therapy. IEEE Transactions on Medical Imaging (TMI) 21(12), 1445–1449 (2002)
Kacsuk, P., Sipos, G.: Multi-Grid, multi-user workflows in the P-GRADE Grid portal. J. Grid Comput. (JGC) 3(3–4), 221 – 238 (2005)
Khalaf, R., Mukhi, N., Weerawarana, S.: Service-Oriented Composition in BPEL4WS. In: International World Wide Web Conference (WWW), Budapest, Hungary (2003)
Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M., Lee, E., Tao, J., Zhao, Y.: Scientific workflow management and the Kepler system. Concurr. Comput.: Pract. Exp. (2005)
Maino, D., Burigana, C., Maltoni, T.: The Planck-LFI instrument: analysis of the 1/f noise and implications for the scanning strategy. Astron. Astrophys. (A&A) 140(1), 383–392 (1999)
Mandolesi, N., Lawrence, C., Pasian, F., Bersanelli, M., Butler, C., et al.: Planck LFI. Proposal submitted to ESA 1(1), 1–140 (1998)
Mikhailov, M., Shafranov, V., Subbotin, A., et al.: Improved alpha-particle confinement in stellarators with poloidally closed contours of the magnetic field strength. Nucl. Fus. 42, L23–L26 (2002)
Montagnat, J., Glatard, T., Lingrand, D.: Data composition patterns in service-based workflows. In: Workshop on Workflows in Support of Large-Scale Science (WORKS’06), Paris, France (2006)
Nicolau, S., Pennec, X., Soler, L., Ayache, N.: Evaluation of a new 3D/2D registration criterion for liver radio-frequencies guided by augmented reality. In: International Symposium on Surgery Simulation and Soft Tissue Modeling (IS4TM’03), vol. 2673 of LNCS, Juan-les-Pins, pp. 270–283 (2003)
Oinn, T., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, M., Carver, T., Glover, K., Pocock, M., Wipat, A., Li, P.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics J. 17(20), 3045–3054 (2004)
Pautasso, C., Heinis, T., Alonso, G.: JOpera: autonomic service orchestration. IEEE Data Eng. Bull. 29(3) (2006)
Puget, J., Lamarre, J., Sygnet, M., et al.: Planck HFI. Proposal submitted to ESA 1(1), 1–166 (1998)
Taffoni, G., Maino, D., deGasperis, G., et al.: The prototype of a computational Grid for Planck satellite. In: Astronomical Data Analysis Software and Systems (ADASS) XIV, Pasadena, US, p. 4 (2005)
Tanaka, Y., Nakada, H., Sekiguchi, S., Suzumura, T., Matsuoka, S.: Ninf-G: a reference implementation of RPC-based programming middleware for Grid computing. J. Grid Comput. (JGC) 1(1), 41–51 (2003)
Taylor, I., Wand, I., Shields, M., Majithia, S.: Distributed computing with Triana on the Grid. Concurr. Comput.: Pract. Exp 17(1–18) (2005)
Tweed, T., Miguet, S.: Medical image database on the Grid: strategies for data distribution. In: HealthGrid’03, Lyon, France, pp. 152–162 (2003)
Yu, J., Buyya, R.: A taxonomy of workflow management systems for Grid computing. J. Grid Comput. (JGC) 3(3–4), 171–200 (2005)
Zaldarriaga, M., Seljak, U.: CMBFAST for spatially closed universes. Astrophys. J., Suppl. Ser. 129(2), 431–434 (2000)