Robotics and Biology Laboratory

Protein Structure Prediction

Proteins are one of the most abundant molecules in living organisms. They are in charge of a variety of crucial functions, such as transporting molecules (e.g. Hemoglobin), catalyzing reactions (e.g. Enzymes),  replicating DNA, identifying and neutralizing foreign bacteria and viruses (e.g. Antibodies) and many more. Predicting the protein structures can help us understand how they function, to create new drugs targeting them, to understand how mutation affect them.

We were continuously developing novel structure prediction methods and integrated them into a fully automated protein structure prediction pipeline, RBO Aleph.


The web interface to RBO Aleph as well as to the EPC-map and epsilon are offline as of 2022.

Predicting protein contacts by combining information from sequence and physicochemistry (EPSILON-CP)

Contact Persons

Kolja Stahl

Project description

Contact prediction is an intermediate step towards solving the protein structure prediction problem. Contact prediction methods identify residue pairs that are close in space in the native structure. Knowledge of the contact map can then be used to guide ab initio methods and to reconstruct the 3D structure of a protein. Due to the size of the search space, contact prediction remains a hard problem. To make the problem tractable, information are given as priors to the model to constrain the search space. Currently, many different information sources are used in contact prediction. We want to exploit the different profiles to alleviate potential weaknesses.

We developed a novel contact prediction method (EPSILON-CP) that combines evolutionary, sequence-based and physicochemical information. The physicochemical information stem from EPC-map (see compbio.robotics.tu-berlin.de/epc-map/ ), a method developed by Michael Schneider as part of his PhD, that ranked 2nd for long+medium range contacts and 5th for long-range contacts in CASP11. EPSILON-CP utilizes a deep neural network to effectively combine the aforementioned information sources. A key contribution is the refined feature set with drastically reduced dimensionality. EPISLON-CP ranked 5th in the final ranking of the CASP12 contact prediction assessment (group name RBO-EPSILON).

Topology-Based Search for Protein Structure Prediction

Contact Persons

Mahmoud Mabrouk

Project description

The major challenge of ab initio protein structure predictions is the huge conformational space populated by large proteins which has to be sampled in order to find the native structure. Due to the size of the conformational space, the probability of sampling from the vicinity of the native conformation is low.  But is it really necessary to consider all possible conformations while searching?

Despite having diverse shapes and functions, proteins only populate a tiny part of the space of possible conformations. Our goal is to leverage our knowledge about these populated topologies to guide the search. We strongly believe that using this information during sampling will alleviate many of the problems arising from the size of the conformational space. This in turn should allow us to predict many proteins which are traditionally unsolved by ab initio.

Model-Based-Search for Protein Structure Prediction

Contact Persons

Mahmoud Mabrouk

Kolja Stahl

Project description

Model-based search (MBS) is our basic method for efficient conformational search in structure prediction. Typically, conformational search in structure prediction is uninformed and proceeds by executing many Monte Carlo simulations, pooling the results and selecting the best solutions. In contrast, MBS tries to gain information about the underlying energy landscape during search. Each conformation that is sampled by MBS is considered as a sample on the energy landscape. MBS analyzes the quality and the distribution of the samples to identify "funnels" in the energy landscape, regions that are likely to contain the native state. As MBS progresses, it gradually refines its model of the energy landscape and allocates computational resources to regions that are promising.
MBS forms the basic algorithm of all our structure prediction efforts. We used a new implementation of the algorithm that was first introduced in CASP8. The algorithm is most suitable for "free modeling", which is the modeling of protein structures that cannot be modeled by exploiting the sequence-structure similarities to other proteins.

In CASP10, our server RBO-MBS ranked 10th out of 68 automatic servers that participated in the free-modeling category. 

In CASP11, our server RBO-Aleph ranked 3rd out of 44 automatic servers that participated in the free-modeling category.

Protein Structure Determination using Cross-linking/Mass Spectrometry and Computational Biology

Contact Persons

Mahmoud Mabrouk

Kolja Stahl

Project description

We are developing novel ways of to determine protein structure using a combination of chemistry and computation. In this project, we developed a highly reactive photochemistry that increases the number of cross-links 17x over earlier, low resolution cross-linking approaches. We combine this data with conformational space search algorithms in a "hybrid" approach to determine protein structure.

We demonstrated the potential of this method by determining the structure of human serum albumin domains in the context of human blood serum. This demonstrates the possibility of determining the structure of proteins in the complex biological contexts  in which they function and which they may require for correct folding.

Photo Cross-linking/mass spectrometry (CLMS)

© RBO

Contact Persons

Kolja Stahl

Oliver Brock

Project Description

Many protein systems are elusive to structure analysis with established methods. This project aims to develop novel methods for protein structure determination to target this problem class of proteins. The proposed method is based on high-density cross-link/mass spectrometry (CLMS) data and custom-tailored computational algorithms to interpret them. Specifically, this project targets three critical and interdependent endeavors for advancing cross-linking for structure determination: 1) Increasing the density of CLMS data, 2) improving the distribution of CLMS data, and 3) combining high-density CLMS data with customized conformational space search algorithms.

To improve the data density of CLMS data, we will test and evaluate different fragmentation methods in combination with high-density cross-linking reagents, such as the photoactivatable diazirine-based cross-linker sulfo-SDA. This requires adjustments in mass spectrometric measurements and settings, as well further developments in the computational interpretation of peptide and fragmentation spectra. We further propose a graph-based analysis to incorporate corroborate information between cross-links to boost the accuracy and density of the links.

To improve the distribution of CLMS data, we target a current limitation of trypsin-based digestion protocols. Trypsin-based digestion might create peptides that are too small or too large for mass spectrometric analysis because of uneven trypsin cleavage site distribution of the target protein. We will test alternative proteases that have cleavage sites to trypsin or unspecific cleavage sites. We aim to develop multi-digestion protocols to ensure even distribution of links over the sequence. This includes the modification of mass spectrometric acquisition protocols that are currently optimized for peptide detection of trypsin digested proteins.

Furthermore, we will develop custom-tailored computational methods to leverage the cross-linking data in structure modeling. To accomplish this, we will develop algorithms that are able to retrieve structure information from protein structure databases using the CLMS data. In addition, we will develop noise-robust structure modeling algorithms that compensate for the noisy nature of high-density CLMS data. This will be accomplished by a conformational space search algorithm that automatically updates its belief of the used cross-links during. This algorithm will reject noise in CLMS data and therefore improve the quality of the resulting structure models.

This project will rigorously evaluate the proposed method by a blind test in the context of the 13th community-wide Critical Assessment of protein Structure Prediction (CASP) experiment. We will cross-link proteins with unknown structure and test our computational algorithms using this data. In addition, the CLMS data will be disseminated in CASP to other structure prediction groups to maximize the impact of the proposed method.

Funding

Alexander von Humboldt

National Institutes of Health

Predicting Protein Structure with Guided Conformation Space Search -  funded by the National Institutes of Health (NIH), award number 5R01 GM076706,
August 2006 - May 2013

Publications

2023

Stahl, Kolja; Brock, Oliver; Rappsilber, Juri; Dau, Therese; Graziadei, Andrea
Protein structure prediction with in-cell photo-crosslinking mass spectrometry and deep learning
Nature Biotechnology
März 2023

2017

Stahl, Kolja; Schneider, Michael; Brock, Oliver
EPSILON-CP: using deep learning to combine information from multiple sources for protein contact prediction
BMC Bioinformatics, 18 (1) :303
2017

2016

Mabrouk, Mahmoud; Werner, Tim; Schneider, Michael; Putz, Ines; Brock, Oliver
Analysis of free modeling predictions by RBO aleph in CASP11
Proteins: Structure, Function, and Bioinformatics, 84(Suppl 1) :87–104
2016
Schneider, Michael; Belsom, Adam; Rappsilber, Juri; Brock, Oliver
Blind Testing of Crosslinking/Mass Spectrometry Hybrid Methods in CASP11
Proteins: Structure, Function, and Bioinformatics :152-163
2016

2015

Mabrouk, Mahmoud; Putz, Ines; Werner, Tim; Schneider, Michael; Neeb, Moritz; Bartels, Philipp; Brock, Oliver
RBO Aleph: Leveraging Novel Information Sources for Protein Structure Prediction
Nucleic Acids Research, 43 (W1) :W343-W348
April 2015