Database Systems and Information Management


The Technische Universität Berlin, Humboldt Universität Berlin, and the Hasso-Plattner-Institut in Potsdam are jointly researching "Information Management on the Cloud" through the "Stratosphere" Collaborative Research Unit funded by the Deutsche Forschungsgemeinschaft (DFG). Stratos­phere aims at con­siderably advancing the state-of-art in data processing on parallel, adaptive architectures. Stra­tosphere (named after the layer of the atmosphere above the clouds) explores the power of massively parallel computing for complex information manage­ment applications. Building on the expertise of the participating researchers, we aim to de­velop a novel, database-inspired approach to ana­lyze, aggregate, and query very large collec­tions of either textual or (semi-)structured data on a virtualized, massively parallel cluster archi­tecture.

Stratosphere will conduct research in the areas of massively parallel data processing engines, a programming model for parallel data programming, robust optimization of declarative data flow programs, continuous re-optimization and adaptation of the execution, data cleansing, and text mining. The unit will validate its work through a benchmark of the overall system performance and by demonstrators in the areas of climate research, the biosciences and linked open data.

The goal of Stratosphere is to jointly research and build a large-scale data processor based on concepts of robust and adaptive execution. We will be researching a programming model that extends a functional map/reduce programming model with additional second order functions. As execution platform we use a Dryad-like massively parallel data flow engine that will also researched and developed in the project. We will be examining real-world use-cases in the area of climate research, information extraction and integration of unstructured data in the life-sciences, as well as linked open data and social network graph data. The massively parallel data processing system developed in the Stratosphere project is publicly available under Apache license at the project's website.

The group will provide the opportunity to perform high-quality and cutting-edge research in an international context and in strong cooperation. Suitable Postdoc candidates will have the chance to perform research on the topics of the group, which qualifies them for an academic career and allows them to establish themselves as independent researchers in the research community.

The project will be carried out jointly by Prof. Volker Markl (TU Berlin, Database Systems and Information Management Group), who will act as speaker of the unit, as well as Prof. Odej Kao (TU Berlin, Distributed Systems Group), Prof. Johann-Christoph Freytag, (HU Berlin, Database and Information Systems Group), Prof. Ulf Leser (HU Berlin, Knowledge Management in Bioinformatics), and Prof. Felix Naumann (HPI Potsdam, Database and Information Systems Group).

More information about the Stratosphere is available at the project's website: