Proteins are one of the most abundant molecules in living organisms. They are in charge of a variety of crucial functions, such as transporting molecules (e.g. Hemoglobin), catalyzing reactions (e.g. Enzymes), replicating DNA, identifying and neutralizing foreign bacteria and viruses (e.g. Antibodies) and many more. Predicting the protein structures can help us understand how they function, to create new drugs targeting them, to understand how mutation affect them.
We are continuously developing novel structure prediction methods and integrating them into a fully automated protein structure prediction pipeline, RBO Aleph.
A web interface of to RBO Aleph is available here: http://compbio.robotics.tu-berlin.de/rbo_aleph/
Contact prediction is an intermediate step towards solving the protein structure prediction problem. Contact prediction methods identify residue pairs that are close in space in the native structure. Knowledge of the contact map can then be used to guide ab initio methods and to reconstruct the 3D structure of a protein. Due to the size of the search space, contact prediction remains a hard problem. To make the problem tractable, information are given as priors to the model to constrain the search space. Currently, many different information sources are used in contact prediction. We want to exploit the different profiles to alleviate potential weaknesses.
We developed a novel contact prediction method (EPSILON-CP) that combines evolutionary, sequence-based and physicochemical information. The physicochemical information stem from EPC-map (see compbio.robotics.tu-berlin.de/epc-map/ ), a method developed by Michael Schneider as part of his PhD, that ranked 2nd for long+medium range contacts and 5th for long-range contacts in CASP11. EPSILON-CP utilizes a deep neural network to effectively combine the aforementioned information sources. A key contribution is the refined feature set with drastically reduced dimensionality. EPISLON-CP ranked 5th in the final ranking of the CASP12 contact prediction assessment (group name RBO-EPSILON).
The major challenge of ab initio protein structure predictions is the huge conformational space populated by large proteins which has to be sampled in order to find the native structure. Due to the size of the conformational space, the probability of sampling from the vicinity of the native conformation is low. But is it really necessary to consider all possible conformations while searching?
Despite having diverse shapes and functions, proteins only populate a tiny part of the space of possible conformations. Our goal is to leverage our knowledge about these populated topologies to guide the search. We strongly believe that using this information during sampling will alleviate many of the problems arising from the size of the conformational space. This in turn should allow us to predict many proteins which are traditionally unsolved by ab initio.
Model-based search (MBS) is our basic method for efficient conformational search in structure prediction. Typically, conformational search in structure prediction is uninformed and proceeds by executing many Monte Carlo simulations, pooling the results and selecting the best solutions. In contrast, MBS tries to gain information about the underlying energy landscape during search. Each conformation that is sampled by MBS is considered as a sample on the energy landscape. MBS analyzes the quality and the distribution of the samples to identify "funnels" in the energy landscape, regions that are likely to contain the native state. As MBS progresses, it gradually refines its model of the energy landscape and allocates computational resources to regions that are promising.
MBS forms the basic algorithm of all our structure prediction efforts. We used a new implementation of the algorithm that was first introduced in CASP8. The algorithm is most suitable for "free modeling", which is the modeling of protein structures that cannot be modeled by exploiting the sequence-structure similarities to other proteins.
In CASP10, our server RBO-MBS ranked 10th out of 68 automatic servers that participated in the free-modeling category.
In CASP11, our server RBO-Aleph ranked 3rd out of 44 automatic servers that participated in the free-modeling category.
We are developing novel ways of to determine protein structure using a combination of chemistry and computation. In this project, we developed a highly reactive photochemistry that increases the number of cross-links 17x over earlier, low resolution cross-linking approaches. We combine this data with conformational space search algorithms in a "hybrid" approach to determine protein structure.
We demonstrated the potential of this method by determining the structure of human serum albumin domains in the context of human blood serum. This demonstrates the possibility of determining the structure of proteins in the complex biological contexts in which they function and which they may require for correct folding.
Predicting Protein Structure with Guided Conformation Space Search - funded by the National Institutes of Health (NIH), award number 5R01 GM076706,
August 2006 - May 2013