Big Data Engineering

Course Overview Summer Term 2023

  1. Architecture of ML Systems

    AMLS is a 6 ECTS module, applicable to the master study courses computer science, computer engineering, information systems management, and electrical engineering, as well as the study areas data and software engineering, cognitive systems, and distributed systems and networks. Machine learning (ML) applications profoundly transform our lives, and many domains such as health care, finance, media, transportation, production, and information technology itself. In a narrow sense, ML systems are software systems underpinning theses ML applications. However, in a broad sense, ML systems comprise the entire systems from ML applications, over the compiler/runtime stack, to the underlying heterogeneous hardware devices.

    This module covers the architecture and essential concepts of modern machine learning (ML) systems for both local and large-scale machine learning. These architectures include systems for data-parallel execution, parameter servers, ML lifecycle systems, and the integration of ML into database systems. The covered topics focus both on a microscopic view of internal compilation, execution, and data management techniques, as well as a macroscopic view of end-to-end ML pipelines. To learn the details of the individual topics and the lecture calendar click the name of the module above.

  2. Large-scale Data Engineering

    Conducting research in the areas of data engineering, data management, and machine learning systems requires the ability to deal with scientific literature in these areas as well as to design, implement, and evaluate prototypes. To facilitate these skills, we offer a seminar and a programming project on Large-scale Data Engineering (LDE) as a combined 12 ECTS module, which can be taken by bachelor and master students. Alternatively, only bachelor students may take the seminar (3 ECTS) and project (9 ECTS) as separate modules. Taking both seminar and project is the ideal preparation for a bachelor/master thesis with our group.

    In this semester, the seminar focuses on the umbrella topic of extensible data systems, taking a tour through some of the most important works at various levels of the database and ML systems architecture. The projects will be conducted in the context of the two free open-source systems developed in our group as part of our research, Apache SystemDS and DAPHNE. For more details, please click the name of the module.

  3. Joint Seminar on Machine Learning and Data Management Systems

    This is a joint research-oriented seminar of the Machine Learning Group and the Data Management Group. Throughout the seminar, students will have the opportunity to learn about recent advances in the intersection of Machine Learning and Data Management Systems.

    Interested students are required to participate in the kick-off meeting after which they will select, read, understand, and (if possible) programmatically evaluate one of the eligible papers (TBA), before giving a final 10-15 min presentation in the English language at the end of the semester. More details will be discussed during the Kick-off meeting.

    Example topics include:
    - Federated/Deep Ensemble Learning and Data Management Systems.
    - Carbon-aware data management systems and Machine Learning.
    - Compression of ML Systems.
    - Continual, Lifelong, and Online Learning.
    - Data management systems for Lifelong Learning.
    - Hashing and sketches.
    - Building ML pipelines for large large scale data preparation, model training and model debugging, versioning, and monitoring.
    - AutoML.