Open Distributed Systems

Research Projects

This page give an overview of ODS research projects.

Further ODS research projects in the context of DCAITI can be found here.


Berlin Institute for the Foundations of Learning and Data - BIFOLD

BIFOLD conducts foundational research in big data management and machine learning, as well as its intersection, to educate future talent, and create high-impact knowledge exchange.

The Berlin Institute for the Foundations of Learning and Data (BIFOLD), has evolved in 2019 from the merger of two national Artificial Intelligence Competence Centers: the Berlin Big Data Center (BBDC) and the Berlin Center for Machine Learning (BZML). Embedded in the vibrant Berlin metropolitan area, BIFOLD provides an outstanding scientific environment and numerous collaboration opportunities for national and international researchers. BIFOLD offers a broad range of research topics as well as a platform for interdisciplinary research and knowledge exchange with the sciences and humanities, industry, startups and society.

Data management (DM) and machine learning (ML) are the scientific and technical pillars powering the current wave of innovation in artificial intelligence (AI); it is the efficient processing and intelligent analysis of very large, complex, heterogeneous data that has the potential to revolutionize and substantially improve our lives and societies. BIFOLD conducts scalable yet agile foundational AI research. Furthermore, it addresses the emerging challenges and requirements created by the rapidly growing importance of data management and machine learning in practically all areas, from medicine, industry, natural sciences, humanities, e-commerce, and media, to government and society.

More information is available here:

Computing Foundations For Semantic Stream Processing (COSMO)

The ability to process stream data is ubiquitous in modern information systems. The grand challenge in establishing a processing framework for powering such systems is how to strike the right balance between expressivity and computability in a highly dynamic setting. The expressivity of the framework reflects what kind of input data and what types of processing operations it enables. The computability corresponds to its ability to process a certain workload (e.g., processing workflow and data size) under an execution setting (e.g., CPU, RAM and network bandwidth).

So far, various research communities have independently addressed this challenge by imposing their application-specific trade-offs and assumptions to their underlying processing model. Such trade-offs and assumptions are driven by prior-knowledge on data characteristics (e.g., format, modality, schema and distribution), processing workload and computation settings. However, the recent developments of the Internet of Things and AI have brought completely new levels of expressivity of the processing pipeline as well as dynamicity of computation settings. For instance, a typical processing pipeline of a connected vehicle includes not only multimodal stream elements generated at real-time by unprecedented types of sensors but also very complex processing workflows including logic reasoning and statistical inference. Furthermore, this pipeline can be executed in a highly dynamic distributed setting, e.g. combining in-car processing units and cloud/edge computing infrastructures. The processing pipeline and the setup of this kind hence need a radical overhaul of the state of the art of several areas.

To this end, this project will aim to establish computing foundations that enable a unified processing framework to address this grand challenge. The targeted framework will propose a semantic-based processing model with a standard-oriented graph data model and query language fragments, called Semantic Stream Processing. Therefore, the project will carry out a systematic study on tractable classes of a wide range of processing operators, e.g, graph query pattern, logic reasoning, and statistical inference on stream data. The newly-invented tractable classes of processing operations will pave the way for designing efficient classes of incremental evaluation algorithms. To address the scalability, the project will also study how to elastically and robustly scale a highly expressive stream processing pipeline in a dynamic and distributed computing environment. Moreover, the project will investigate a novel optimisation mechanism which combines the logic optimisation algorithms which exploit rewriting rules and pruning constraints with adaptive optimisation algorithms which continuously optimise its execution plans based on runtime statistics. The proposed algorithms and framework will be extensively and systematically evaluated in two application domains, connected vehicles and the Web of Things.


As an interdisciplinary scientific technology field, catalysis is of great strategic importance for the economy and society as a whole. It is one of the most important core technologies for simultaneously solving the pressing challenges of climate change, the supply of sustainable energy and sustainable materials. Concrete examples are the reduction or complete avoidance of CO2 emissions, the recycling of plastic waste and CO2 in chemical production, sustainable hydrogen production, fuel cell technology or the sustainable nutrition of more than seven billion people on earth. They all require groundbreaking advances in catalysis science and technology.

This requires a fundamental change in catalysis research, chemical engineering and process technology. A key challenge is to bring together the different disciplines in catalysis research and technology with the support of data scientists and mathematicians. The aim is to redefine catalysis research in the digital age. This so-called “digital catalysis” is to be realized along the data value chain, which is oriented along “molecules to chemical processes”.

The NFDI4Cat consortium, coordinated by DECHEMA (Society for Chemical Engineering and Biotechnology), consists of experts from the fields of homogeneous, heterogeneous, photo-, bio- and electrocatalysis and is supplemented by experts from the engineering, data and mathematical sciences. Partner institutions are:

  • Leibniz Institute for Catalysis e.V. (LIKAT)
  • Friedrich-Alexander-Universität Erlangen
  • RWTH Aachen
  • Universität Greifswald
  • Universität Leipzig
  • Universität Rostock
  • TU Berlin
  • TU Braunschweig
  • TU Dortmund
  • TU München
  • Fraunhofer Institute for Open Communication Systems (FOKUS)
  • High Performance Computing Center Stuttgart (HLRS)
  • Karlsruhe Institute of Technology (KIT)
  • Max Planck Institute for Chemical Energy Conversion
  • Max Planck Institute for Dynamics of Complex Technical Systems

The consortium is complemented by the TU Darmstadt as an associated partner. A unique selling point of NFDI4Cat is the role of the industry, which supports NFDI4Cat in an advisory capacity. In addition to hte GmbH, which will play a leading role, the companies include industry representatives BASF SE, Clariant Produkte GmbH (Catalysts), Covestro Deutschland AG, Evonik Industries AG, Linde AG (Engineering Division) and thyssenkrupp Industrial Solutions AG.

In order to achieve the overall objectives of NFDI in an interdisciplinary way, NFDI4Cat will cooperate particularly closely with other funded and emerging consortia such as NFDI4Ing and NFDI4Chem due to overlapping areas of interest.

About the National Research Data Infrastructure:

The National Research Data Infrastructure (NFDI) aims to systematically develop, sustainably secure and make the data sets of science and research accessible, as well as networking them (inter-)nationally. It is currently being set up in a process driven by science as a networked structure of individually acting consortia. The NFDI will be established in three stages over a period of three years (2019 to 2021). In each of the three stages, new consortia can be admitted to the NFDI in a science-led application process. The Federal Government and the federal states intend to fund up to 30 consortia in total. In the final stage, up to 85 million euros per year will be available for funding.

More information is available here:

NFDI4DataScience - NFDI4DS

The vision of NFDI4DataScience (NFDI4DS) is to support all steps of the complex and interdisciplinary research data lifecycle, including collecting/creating, processing, analyzing, publishing, archiving, and reusing resources in Data Science and Artificial Intelligence.

The past years have seen a paradigm shift, with computational methods increasingly relying on data-driven and often deep learning-based approaches, leading to the establishment and ubiquity of Data Science as a discipline driven by advances in the field of Computer Science. Transparency, reproducibility and fairness have become crucial challenges for Data Science and Artificial Intelligence due to the complexity of contemporary Data Science methods, often relying on a combination of code, models and data used for training. NFDI4DS will promote fair and open research data infrastructures supporting all involved resources such as code, models, data, or publications through an integrated approach.

The overarching objective of NFDI4DS is the development, establishment, and sustainment of a national research data infrastructure for the Data Science and Artificial Intelligence community in Germany. This will also deliver benefits for a wider community requiring data analytics solutions, within the NFDI and beyond. The key idea is to work towards increasing the transparency, reproducibility and fairness of Data Science and Artificial Intelligence projects, by making all digital artifacts available, interlinking them, and offering innovative tools and services. Based on the reuse of these digital objects, this enables new and innovative research.

NFDI4DS intends to represent the Data Science and Artificial Intelligence community in academia, which is an interdisciplinary field rooted in Computer Science. We aim to reuse existing solutions and to collaborate closely with the other NFDI consortia and beyond. In the initial phase, NFDI4DS will focus on four Data Science intense application areas: language technology, biomedical sciences, information sciences and social sciences. The expertise available in NFDI4DS ensures that metadata standards are interoperable across domains and that new ways of dealing with digital objects arise.

More information is available here:

Basic Services for NFDI - Base4NDFI

Base4NFDI presents a unique chance for the German science system. Through broad cooperation of scientific domains and infrastructure-providers we set out to identify and exploit synergies in the scientific data infrastructure.  NFDI-wide basic services will have the potential to serve most or all consortia and thus have a significant impact on the efficiency of the German research community. To this end, Base4NFDI will support services in a three-phase process: their initialization, integration and ramping up for service operation. In Base4NFDI a “service” is understood as a technical-organisational solution, which typically includes storage and computing services, software, processes and workflows, as well as the necessary personnel support for different service desks.

More information is available here:

Berlin Open Science Platform

The Berlin Open Science Platform (BOP) is a curation platform for research data developed for the Berlin University Alliance (BUA). BOP provides sharing and processing services for research data and supports openness, transparency and participation in research. BOP shall enable users to

  • find and access research data (publications, datasets) of the BUA partners through a single point of access,
  • combine research data in experiments and evaluate it (data curation: visualization, data clustering, text summarization, text translation), and
  • connect researchers from different disciplines, to simplify collaborations between them and to support sustainable research within the Berlin University Alliance.

The project is embedded in an initiative of Objective 5 to build a SOURCE centre which will provide a portal for services for electronic research data. It is planned that the platform will be made sustainable long-term through the university libraries.

The software development is accompanied by co-creation workshops to assess the requirements and the resulting prototypes in a systematic way. The goal is to design the development in a value-oriented fashion collaboratively with its users to ensure that the use of the platform is in line with the value-oriented governance of the BUA. This subproject shall also obtain results on the acceptance, design and use of the co-creation workshops themselves to identify relevant aspects of co-creation workshops for further use in the BUA context. The sub-project supports the complete project life cycle and aims at supporting a software development process which addresses all requirements and expectations of the users to facilitate collaborative prototype development and testing.

Berlin Big Data Center Phase II - BBDC II

Funded by the German Federal Ministry of Education and Research (BMBF) and established in 2014, the Berlin Big Data Center (BBDC) is a national big data competence center led by the Technische Universität Berlin (TUB). In Phase I, besides TUB, BBDC consortium partners included the Beuth University of Applied Sciences Berlin, the German Research Center for Artificial Intelligence (DFKI), the Fritz-Haber-Institute of the Max Planck Society, and the Zuse Institute Berlin (ZIB). Over its initial four-year period, the BBDC sought to prepare German/European industry, science and society for the global big data revolution. Its key objectives include:

  1. conducting fundamental research to enable scalable big data analysis,
  2. developing an integrated, declarative, and highly scalable open-source system for advanced data analysis,
  3. transferring technology and know-how to support innovation in industry, and
  4. educating future data scientists at leading academic programs.

In 2018, the BBDC entered into a subsequent three-year period due to an additional funding award by the BMBF. In Phase II, besides TUB, the consortium partners include Charité Universitätsmedizin Berlin, DFKI, Technische Universität Braunschweig, and ZIB. In this phase, research will be carried out at the intersection of scalable data management and machine learning, in support of big data and data science. In particular, the BBDC will continue to explore scalability issues surrounding the real-time processing of data streams anddeclarative machine learning on massive datasets. In addition, varying application areas will be addressed, including the analysis of distributed biomedical data and heterogeneous morphomolecular data arising in cancer research, learning on compressed data streams, real-time speech technology for interactive user interfaces, as well as security and privacy issues concerning the handling of sensitive personal information in big data systems. Moreover, the BBDC will closely collaborate with the newly established Berlin Center for Machine Learning (BZML).

For more information, please visit: