LIGO Document P070017-x0

Optimizing Workflow Data Footprint

Document #:
Document type:
P - Publications
In this paper we examine the issue of optimizing disk usage and scheduling large-scale scientific workflows onto distributed resources where the workflows are data-intensive, requiring large amounts of data storage, and the resources have limited storage resources. Our approach is two-fold: we minimize the amount of space a workflow requires during execution by removing data files at runtime when they are no longer needed and we demonstrate that workflows may have to be restructured to reduce the overall data footprint of the workflow. We show the results of our data management and workflow restructuring solutions using a Laser Interferometer Gravitational-Wave Observatory (LIGO) application and an astronomy application, Montage, running on a large-scale production grid-the Open Science Grid. We show that although reducing the data footprint of Montage by 48% can be achieved with dynamic data cleanup techniques, LIGO Scientific Collaboration workflows require additional restructuring to achieve a 56% reduction in data space usage. We also examine the cost of the workflow restructuring in terms of the application's runtime.
Files in Document:
Notes and Changes:

Rev P070017-00-Z:
- Full document number: LIGO-P070017-00-Z
- Author(s): G B. Berriman; Kent Blackburn; Duncan Brown; Ewa Deelman; Steve Fairhurst; John Good; Daniel S. Katz; Gaurang Mehta; David Meyers; Arun Ramakrishnan; Rizos Sakellariou; Gurmeet Singh; Karan Vahi; Henan Zhao
- Document date: 2007-04-17
- Document received date: 2007-04-18
- Document entry date: 2007-04-18

DCC Version 3.4.3, contact Document Database Administrators