Database Systems and Information Management

An Interview with Dr. Sebastian Schelter

"The DIMA staff members I met were very approachable. They took the time to explain their research, particularly, the system they were building in detail."

Sebastian Schelter was a PhD student at DIMA from 2011 to 2015 and held a senior researcher part time position until 2017. Afterwards, he joined Amazon as an Applied Scientist.

How did you learn about DIMA? What was your first impression?

I initially learned about DIMA from a friend/collaborator on an open-source project. My first impression was quite favorable. The DIMA staff members I met were very approachable. They took the time to explain their research, particularly, the system they were building in detail.

What were the goals of your PhD research?

The core of my PhD research focused on scaling data mining algorithms and systems to process very-large datasets.

What was your doctoral student experience like at DIMA?

I think the same experience encountered by every doctoral student: There are ups and downs and you learn a lot about yourself. However, in my case, I left a well-paid industry job to become an academic and I have never regretted this decision.

What makes DIMA special?

I particularly enjoyed the opportunity to participate in internships, in both industrial research labs and companies located in Silicon Valley.

What were your experiences with respect to peer collaborations (e.g., writing papers) and student supervision (e.g., Bachelor’s and Master’s theses) at DIMA?

Since I was unable to tackle many of the ideas that came to mind, I enjoyed handing these off to capable students (e.g., as thesis topics). Most of the students I supervised produced theses that I am very proud of. My three most-favorite Master’s theses dealt with the development of a compiler for the distributed execution of MATLAB programs, a serendipitous recommender system for food recipes, and a recommender system for controversial news.

How did your experience at DIMA help advance your career?

I learned a lot about the fundamentals of data processing. I met many like-minded researchers from all over the world. I greatly improved my writing and presentation skills. Today, I am a member of a machine learning team at a world leading Internet company. From my perspective, I would never have been offered that job without the experience I gained while working at DIMA.

Would you recommend students pursue their doctoral studies at DIMA?

I would recommend it to anyone who is passionate about research, willing to work hard, and self-motivated.


Sebastian Schelter is currently an Applied Scientist at Amazon’s Core Machine Learning Team in Berlin, and a guest lecturer at the Database Systems and Information Management Group of TU Berlin. His research focuses on the intersection of data management and machine learning, and incorporates a wide variety of aspects, such as metadata management for end-to-end ML applications, systems design for parallel data processing, scalable algorithms and lately also the application of data mining to domains such as the web and social networks. In the last two years, he established the workshop "Data Management for End-to-End Machine Learning (DEEM)" at ACM Sigmod. Sebastian received his Ph.D. from TU Berlin, advised by Volker Markl. During his studies, he has been interning at IBM Research Almaden and Twitter in California. Furthermore, he is engaged in Open Source as a member of the Apache Software Foundation, where he has been a committer and PMC member in the Mahout, Giraph and Flink projects, currently serves as a mentor for the Apache MXNet project during its incubation. For latest information, visit


[1] BlockJoin: Efficient Matrix Partitioning Through Joins. Andreas Kunft, Asterios Katsifodimos, Sebastian Schelter, Volker Markl. International Conference on Very Large Databases (VLDB). 2018.

[2]Automatically Tracking Metadata and Provenance of Machine Learning. Sebastian Schelter, Joos-Hendrik Böse, Johannes Kirschnick, Thoralf Klein, Stephan Seufert. Experiments Machine Learning Systems workshop at the conference on Neural Information Processing Systems (NIPS). 2017.

[3] Probabilistic Demand Forecasting at Scale. Joos-Hendrik Böse, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Dustin Lange, David Salinas, Sebastian Schelter, Matthias Seeger, Yuyang Wang. International Conference on Very Large Databases (VLDB). 2017.

[4] The Stratosphere platform for big data analytics. Alexander Alexandrov, Rico Bergmann, Stephan Ewen, Johann-Christoph Freytag, Fabian Hueske, Arvid Heise, Odej Kao, Marcus Leich, Ulf Leser, Volker Markl, Felix Naumann, Mathias Peters, Astrid Rheinländer, Matthias Sax, Sebastian Schelter, Mareike Höger, Kostas Tzoumas, Daniel Warneke. VLDB Journal. 2014.

[5] All Roads Lead to Rome: Optimistic Recovery for Distributed Iterative Data Processing. Sebastian Schelter, Stephan Ewen, Kostas Tzoumas, Volker Markl. ACM Conference on Information and Knowledge Management (CIKM). 2013.

[6] Scalable Similarity-Based Neighborhood Methods with MapReduce. Sebastian Schelter, Christoph Boden, Volker Markl. ACM Conference on Recommender Systems (RecSys). 2012.