Big Data Engineering

Publications

List of publications

This publication list covers the last nine years. For a full list see DBLP and Google Scholar.

2023

  • Saeed Fathollahzadeh, Matthias Boehm: GIO: Generating Efficient Matrix and Frame Readers for Custom Data Formats by Example, SIGMOD 2023.
  • Matthias Boehm, Matteo Interlandi, Chris Jermaine: Optimizing Tensor Computations: From Applications to Compilation and Runtime Techniques (Tutorial), SIGMOD 2023.
  • Sebastian Baunsgaard, Matthias Boehm: AWARE: Workload-aware, Redundancy-exploiting Linear Algebra, SIGMOD 2023.
  • Manisha Luthra, Andreas Kipf, Matthias Boehm: A Tutorial Workshop on ML for Systems and Systems for ML, WoLS@BTW 2023.
  • Patrick Damme, Matthias Boehm: Enabling Integrated Data Analysis Pipelines on Heterogeneous Hardware through Holistic Extensibility, NoDMC@BTW 2023. [paper]

2022

  • Sebastian Baunsgaard, Matthias Boehm, Kevin Innerebner, Mito Kehayov, Florian Lackner, Olga Ovcharenko, Arnab Phani, Tobias Rieger, David Weissteiner and Sebastian Benjamin Wrede: Federated Data Preparation, Learning, and Debugging in Apache SystemDS (Demo), CIKM 2022. [paperposterACM DL (OpenAccess)]
  • Arnab Phani, Lukas Erlbacher, Matthias Boehm: UPLIFT: Parallelization Strategies for Feature Transformations in Machine Learning Workloads, PVLDB 2022 15(11). [paper]
  • Matthias Boehm, Paroma Varma, Doris Xin: DEEM'22: Data Management for End-to-End Machine Learning, DEEM@SIGMOD 2022. [paper]
  • Patrick Damme, Marius Birkenbach, Constantinos Bitsakos, Matthias Boehm, Philippe Bonnet, Florina Ciorba, Mark Dokter, Pawel Dowgiallo, Ahmed Eleliemy, Christian Faerber, Georgios Goumas, Dirk Habich, Niclas Hedam, Marlies Hofer, Wenjun Huang, Kevin Innerebner, Vasileios Karakostas, Roman Kern, Tomaž Kosar, Alexander Krause, Daniel Krems, Andreas Laber, Wolfgang Lehner, Eric Mier, Marcus Paradies, Bernhard Peischl, Gabrielle Poerwawinata, Stratos Psomadakis, Tilmann Rabl, Piotr Ratuszniak, Pedro Silva, Nikolai Skuppin, Andreas Starzacher, Benjamin Steinwender, Ilin Tolovski, Pınar Tözün, Wojciech Ulatowski, Yuanyuan Wang, Izajasz Wrosz, Aleš Zamuda, Ce Zhang, Xiao Xiang Zhu: DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines, CIDR 2022. [paperslidesvideo]

2021

  • Svetlana Sagadeeva, Matthias Boehm: SliceLine: Fast, Linear-Algebra-based Slice Finding for ML Model Debugging, SIGMOD 2021. [paperrepro]
  • Sebastian Baunsgaard, Matthias Boehm, Ankit Chaudhary, Behrouz Derakhshan, Stefan Geißelsöder, Philipp Marian Grulich, Michael Hildebrand, Kevin Innerebner, Volker Markl, Claus Neubauer, Sarah Osterburg, Olga Ovcharenko, Sergey Redyuk, Tobias Rieger, Alireza Rezaei Mahdiraji, Sebastian Benjamin Wrede, Steffen Zeuch: ExDRa: Exploratory Data Science on Federated Raw Data, SIGMOD 2021. [paperslidesACM DL (OpenAccess), repro]
  • Arnab Phani, Benjamin Rath, Matthias Boehm: LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems, SIGMOD 2021. [paperrepro]

2020

  • Prithviraj Sen, Marina Danilevsky, Yunyao Li, Siddhartha Brahma, Matthias Boehm, Laura Chiticariu, Rajasekar Krishnamurthy: Learning Explainable Linguistic Expressions with Neural Inductive Logic Programming for Sentence Classification, EMNLP 2020.
  • Matthias Boehm: Technical Perspective: Declarative Recursive Computation on an RDBMS, SIGMOD Record 2020 49(1). [paper]
  • Matthias Boehm, Iulian Antonov, Sebastian Baunsgaard, Mark Dokter, Robert Ginthör, Kevin Innerebner, Florijan Klezin, Stefanie Lindstaedt, Arnab Phani, Benjamin Rath, Berthold Reinwald, Shafaq Siddiqi, Sebastian Benjamin Wrede: SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle, CIDR 2020. [paper, slides]

2019

  • Johanna Sommer, Matthias Boehm, Alexandre V. Evfimievski, Berthold Reinwald, Peter J. Haas: MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions, SIGMOD 2019. [paperslidesposter]
  • Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald: Compressed Linear Algebra for Large-Scale Machine Learning, Commun. ACM 2019 62(5). [paperLink]
  • Matthias Boehm, Arun Kumar, Jun Yang: Data Management in Machine Learning Systems. Synthesis Lectures on Data Management 11 (1), Morgan & Claypool Publishers 2019. [book]
  • Matthias Boehm, Alexandre V. Evfimievski, Berthold Reinwald: Efficient Data-Parallel Cumulative Aggregates for Large-Scale Machine Learning, BTW 2019. [paperslides]

2018

  • Matthias Boehm, Berthold Reinwald, Dylan Hutchison, Prithviraj Sen, Alexandre V. Evfimievski, Niketan Pansare: On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML, PVLDB 2018 11(12). [paperslidesposter]
  • Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald: Compressed Linear Algebra for Large-Scale Machine Learning, VLDB Journal 2018 27(5). [paperlink]
  • Matthias Boehm: Apache SystemML – Declarative Large-Scale Machine Learning, Encyclopedia of Big Data Technologies 2018. [paper]
  • Niketan Pansare, Michael Dusenberry, Nakul Jindal, Matthias Boehm, Berthold Reinwald, Prithviraj Sen: Deep Learning with Apache SystemML, SysML 2018. [paper]

2017

  • Arun Kumar, Matthias Boehm, Jun Yang: Data Management in Machine Learning: Challenges, Techniques, and Systems (Tutorial), SIGMOD 2017. [paperslidesvideo]
  • Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald: Scaling Machine Learning via Compressed Linear Algebra, SIGMOD Record 2017 46(1). [paper]
  • Tarek Elgamal, Shangyu Luo, Matthias Boehm, Alexandre V. Evfimievski, Shirish Tatikonda, Berthold Reinwald, Prithviraj Sen: SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale Machine Learning, CIDR 2017. [paperslides]

2016

  • Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald: Compressed Linear Algebra for Large-Scale Machine Learning, PVLDB 2016 9(12). [paperslidesposter]
  • Matthias Boehm, Michael Dusenberry, Deron Eriksson, Alexandre V. Evfimievski, Faraz Makari Manshadi, Niketan Pansare, Berthold Reinwald, Frederick Reiss, Prithviraj Sen, Arvind Surve, Shirish Tatikonda: SystemML: Declarative Machine Learning on Spark, PVLDB 2016 9(13). [paperslides]
  • Matthias Boehm, Alexandre V. Evfimievski, Niketan Pansare, Berthold Reinwald: Declarative Machine Learning - A Classification of Basic Properties and Types, CoRR 2016 abs/1605.05826. [paper]

2015

  • Arash Ashari, Shirish Tatikonda, Matthias Boehm, Berthold Reinwald, Keith Campbell, John Keenleyside, P. Sadayappan: On Optimizing Machine Learning Workloads via Kernel Fusion, PPOPP 2015. [paper]
  • Botong Huang, Matthias Boehm, Yuanyuan Tian, Berthold Reinwald, Shirish Tatikonda, Frederick R. Reiss: Resource Elasticity for Large-Scale Machine Learning, SIGMOD 2015. [paperslidesposter]
  • Matthias Boehm: Costing Generated Runtime Execution Plans for Large-Scale Machine Learning Programs, CoRR 2015 abs/1503.06384. [paper]

2014

  • Matthias Boehm, Douglas R. Burdick, Alexandre V. Evfimievski, Berthold Reinwald, Frederick R. Reiss, Prithviraj Sen, Shirish Tatikonda, Yuanyuan Tian: SystemML's Optimizer: Plan Generation for Large-Scale Machine Learning Programs, IEEE Data Eng. Bull. 2014 37(3). [paper]
  • Matthias Boehm, Dirk Habich, Wolfgang Lehner: On-Demand Re-Optimization of Integration Flows. Inf. Syst. 2014 45. [paper]
  • Peter D. Kirchner, Matthias Boehm, Berthold Reinwald, Daby M. Sow, J. Michael Schmidt, Deepak S. Turaga, Alain Biem: Large Scale Discriminative Metric Learning, IPDPS Workshop ParLearning 2014. [paperslides]
  • Matthias Boehm, Shirish Tatikonda, Berthold Reinwald, Prithviraj Sen, Yuanyuan Tian, Douglas Burdick, Shivakumar Vaithyanathan: Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML, PVLDB 2014 7(7). [paperslidesposter]