Researchers in the Database Systems and Information Management (DIMA) Group at TU Berlin presented two workshop papers and a demo paper at SIGMOD 2023 [1], the International Conference of Management of Data, which took place June 18-23 in Seattle, Washington (USA).
Haralampos Gavriilidis presented the paper “P2D: A Transpiler Framework for Optimizing Data Science Pipelines” [2] at the Data Management for End-to-End Machine Learning (DEEM) workshop. The paper addresses the inefficiency of pre-processing operations, a crucial step in data science pipelines, as they are currently not fully leveraging the capabilities of database management systems (DBMSes) as backends. To optimize the pre-processing step, the authors propose a transpilation-based approach that utilizes static code analysis to detect and "push-down" operations to DBMS backends.
The paper “Exploiting Access Pattern Characteristics for Join Reordering” [3] authored by Nils L. Schubert, Philipp M. Grulich, Steffen Zeuch, and Volker Markl examines the memory access pattern of intermediate join state that is an often-neglected performance factor. Based on the analysis, the authors propose a novel join reordering algorithm that detects the memory access pattern and adapts the join order accordingly at runtime.
Kajetan Maliszewski showcased TeeBench [4], a unified benchmarking framework for relational operators across Trusted Execution Environments (TEE). The framework enables researchers to benchmark and evaluate custom implementations of relational operators in a seamless manner. TeeBench comes with a user-friendly GUI as well as with a novel TEE-Analyzer that hints the user about performance bottlenecks and suggests possible code improvements.
The conference program offered a variety of talks, panels and workshops, as well as a wonderful opportunity to exchange and discuss ideas with researchers and industry professionals from the database community. Thereby enabling the researchers to learn about the trends in the database research and its community.
One highlight of the conference was the announcement of the 2023 ACM SIGMOD Systems Award which was given to Apache Flink. The award recognizes “an individual or set of individuals who developed a software or hardware system whose technical contributions have had significant impact on the theory or practice of large-scale data management systems.” These systems usually have large-scale real-world applications and have influenced the design of future data processing systems.
Apache Flink is an open-source big data stream analytics platform. The origins of Apache Flink can be traced back to 2008, when BIFOLD Director Prof. Dr. Volker Markl initially founded the Database Systems and Information Management (DIMA) Research Group [3] at the Technische Universität (TU) Berlin. In 2014, the team at TU Berlin decided to donate the code base to the Apache Software Foundation under the name “Flink.” Today Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams developed by the Apache Software Foundation.
References
[1] SIGMOD 2023, https://2023.sigmod.org/
[2] Yordan Grigorov, Haralampos Gavriilidis, Sergey Redyuk, Kaustubh Beedkar, and Volker Markl. 2023. “P2D: A Transpiler Framework for Optimizing Data Science Pipelines.” In the Proceedings of the Seventh Workshop on Data Management for End-to-End Machine Learning (DEEM '23).
[3] Nils L. Schubert, Philipp M. Grulich, Steffen Zeuch, and Volker Markl. “Exploiting Access Pattern Characteristics for Join Reordering.” In the Proceedings of the 19th International Workshop on Data Management on New Hardware (DaMoN ’23), pp. 10-18. 2023.
[4] Kajetan Maliszewski, Tilman Dietzel, Jorge-Arnulfo Quiané-Ruiz, and Volker Markl. “TeeBench: Seamless Benchmarking in Trusted Execution Environments.” In the Proceedings of SIGMOD 2023, Companion of the 2023 International Conference on Management of Data, pp. 163-166. 2023.