Database Systems and Information Management


All news

The lightning talk ''Towards Efficient and Secure UDF Execution with BabelfishLib'' has been accepted at the CDMS workshop 2023.

We would like to congratulate the authors, DIMA member Phillip Grulich, Dr. Steffen Zeuch, and Prof. Dr. Volker Markl, for the acceptance of the lightning talk ''Towards Efficient and Secure UDF Execution with BabelfishLib'' at the CDMS workshop 2023 that is co-located within the VLDB 2023.


Towards Efficient and Secure UDF Execution with BabelfishLib

Phillip Grulich, Volker Markl

Today, data scientists, web developers, and application developers build complex data processing pipelines combining different tools and programming languages. To this end, most data processing systems offer support for user-defined functions (UDFs) in common languages like Java, Python, or JavaScript. These UDFs enable users to express arbitrary business logic in their preferred programming language, to leverage 3rd-party libraries, and to increase the modularity and testability of their data processing pipelines.

Although UDFs provide a large degree of freedom, their flexibility comes with a high-performance cost compared to traditional relational queries. As a result, most experts recommend avoiding UDFs whenever possible.

To cope with these inefficiencies, research has suggested several strategies. These include translating UDFs to semantically equivalent SQL statements, extending optimizers to the unique properties of UDFs, and devising efficient execution strategies that mitigate the bottlenecks of UDFs. These approaches, while delivering performance benefits, necessitate substantial engineering efforts and amplify system complexity, which hinders their widespread adoption.

To improve this situation, we propose in this talk BabelfishLib, which provides our Babelfish Engine [1] as an extensible component for the efficient and secure execution of UDFs. In an environment where virtually every data management system requires UDF support, BabelfishLib can centralize these efforts and provide a unified UDF runtime that can be used across different systems.

In particular, BabelfishLib targets three major design goals. First, it provides efficient execution strategies for UDFs in different programming languages. Second, it ensures that the execution of untrusted UDF code is isolated from the data processing system, guaranteeing system security. Third, it analyzes UDFs and provides information for further query optimizations. As a result, BabelfishLib mitigates the performance overhead of UDFs in state-of-the-art systems while it ensures security and isolation at the same time. Currently, we leverage BabelfishLib for the acceleration of UDFs in our data processing platform NebulaStream.

We believe that BabelfishLib can be a first step towards a unified accelerator for UDFs, which can be integrated across different data processing systems. Furthermore, it provides a playground for further research focusing on specific aspects of the acceleration of UDF. Finally, through this presentation, we intend to spark a discussion across the community to consolidate requirements for efficient UDF execution and combine different efforts in the same direction.

[1] Philipp M. Grulich, Steffen Zeuch, and Volker Markl. 2021. Babelfish: efficient execution of polyglot queries. Proc. VLDB Endow. 15, 2 (October 2021), 196–210.