Database Systems and Information Management

FONDA: Debugging Distributed Data Analysis Workflows

The DFG Collaborative Research Center FONDA - Foundations of Workflows for the Analysis of Big Data in the Natural Sciences is dedicated to the optimization of data analysis workflows. The goal is to explore techniques, procedures and tools that enable an increase in the productivity of scientists in the creation and application of DAWs on large natural science datasets.

Like other software, DAWs may show unexpected behavior or even crash due to various reasons. Debugging aims at establishing a cause effect relationship between the observable problem and the actual error. Such error identification serves as an initial step of a reliable problem resolution, and thus debugging of DAWs is an indispensable task to increase the dependability of DAWs. However, debugging DAWs is particularly challenging due to the heterogeneous nature of the involved tasks and the distributed nature of the execution engine. The central research question addressed in this subproject is how to enable domain scientists to efficiently formulate, test, and refine a debugging hypothesis in the context of scientific software engineering. It will primarily work together with A3 on the adaptation of software test technologies to distributed DAWs and with B6 on the distributed monitoring of DAW executions. The subproject will be coordinated by Prof. Kehrer, an expert in model-based software development, and Prof. Markl, an expert in large-scale distributed data analytics.

For more information, visit https://gepris.dfg.de/gepris/projekt/444757960.

Project Partners:

  • Humboldt-Universität zu Berlin
  • Freie Universität Berlin
  • Technische Universität Berlin
  • Universität Osnabrück
  • Hasso-Plattner-Institut für Digital Engineering gGmbH
  • Max-Delbrück-Centrum für Molekulare Medizin (MDC)

Funding Agency: German Research Society

Project Duration: 07/2022 - 06/2024