WiDE '24: Proceedings of the 2nd Workshop on Workflows in Distributed Environments

Full Citation in the ACM Digital Library

Workflows' applications in computational environmental science: a survey

This survey paper explores the different applications of workflows in computational environmental science. Workflows are crucial in streamlining complex computational processes, enabling researchers to manage and analyze large-scale environmental data effectively. The paper reviews existing literature, methodologies, and tools associated with workflow applications in environmental science, highlighting their impact on research efficiency, reproducibility, and collaboration. By examining case studies and emerging trends, this survey aims to provide insights into the current landscape of workflow applications within the computational environmental science domain.

Secure Generic Remote Workflow Execution with TEEs

In scientific environments, the frequent need to process substantial volumes of data poses a common challenge. Individuals tasked with executing these computations frequently encounter a deficit in local computational resources, leading them to opt for the facilities of a Cloud Service Provider (CSP) for data processing. However, the data subjected to these calculations may be subject to confidentiality constraints. This paper introduces a proof-of-concept framework that leverages Gramine LibOS and Intel SGX, enabling the protection of generic remote workflow computations through SGX enclaves as Trusted Execution Environments (TEEs). The framework entails the delineation of user and CSP behavior and has been implemented using Bash scripts. Furthermore, an infrastructure has been designed for the Data Center Attestation Primitives (DCAP) remote attestation mechanism, wherein the user gains trust in the proper instantiation of the enclave within the CSP. To assess the framework efficacy, it has been tested on two distinct workflows, one trivial and the other involving real-world bioinformatics applications for processing DNA data. The performance study revealed that the framework incurred an acceptable overhead, ranging from a factor of x1.4 to x1.8 compared to unsafe execution practice.

Advanced Resource Allocation in the Context of Heterogeneous Workflows Management

In High-Performance Computing (HPC), workflows are utilized to define and manage a set of interdependent computations which allow the users to extract insights from (scientific) numerical simulations or data analytics. HPC platforms can perform extreme-scale simulations, combining Artificial Intelligence (AI) training and inference and data analytics (we refer to heterogeneous workflows), by providing tools and computing resources which serve a variety of use-cases spanning very diverse application domains (e.g., weather forecasting, quantum mechanics, etc.). Executing such workflows at scale requires to handle dependencies, job submission automation, I/O mechanisms. Despite State-of-the-Art batch schedulers can be configured and integrated with tools accomplishing this automation, a number of cases where resource allocation can lead to inefficiencies still exist. In this paper, to overcome these limitations, we present the WARP (Workflow-aware Advanced Resource Planner), a tool that integrates with workflow management tools and batch schedulers, to reserve in advance resources for an optimal execution of jobs, based on their duration, dependencies and machine load. WARP has been designed to minimize the overall workflow execution, without violating the priority policies for cluster users imposed by the system administrators.

An ad-hoc file system accelerated workflow application for accidental fire fast response

Accidental fires present a significant and growing threat to ecosystems, communities, and economies worldwide. This paper presents the development and implementation of Smoketracer, a novel on-demand application empowered by advanced computational workflows for fast response to accidental fire and effects mitigation. The DAGonStar workflow engine orchestrates the application leveraging a brand-new ad-hoc file system, turning the workflow:// schema used to define the tasks' data dependencies into an actual user-space distributed in-RAM storage. Although the evaluation results of the DAGon File System (DAGonFS) are still preliminary, the performance of file copy operations is from 31% up to 63% faster than the baseline (shared scratch directory hosted on mechanical hard drives). The accidental fires application wall clock is reduced by 57% when using the DAGonFS on four data nodes.