Leveraging Novel Information for Coarse-Grained Prediction of Protein Motion
Proteins are involved in almost all functions in our cells due to their ability to combine conformational motion with chemical specificity. Hence, information about the motions of a protein provides insights into its function. Proteins move on a rugged energy landscape with many local minima, which is imposed on their high-dimensional conformational space. Exhaustive sampling of this space exceeds the available computational resources for all but the smallest proteins. Computational approaches thus have to simplify the potential energy function and/or resolution of the model using information about what is relevant and what can be ignored. The accuracy of the approximation depends on the accuracy of the used information. Information that is specific to the problem domain, i.e. protein motion in our case, usually results in better models.
In this thesis, I propose a novel elastic network model of learned maintained contacts, lmcENM. It expands the range of motions that can be captured by such simplified models by leveraging novel information about a protein’s structure. This improves the general applicability of elastic network models.
Elastic network models (ENMs) are a highly popular coarse-grained method to study protein motions. They assume that protein motions are harmonic around an equilibrium conformation and largely governed by the protein’s structural connectivity. This leads to the simplified representation of a protein as elastic mass-spring-network based on residue interactions. Despite their simplicity, ENMs predict intrinsic protein motions with surprising biological relevance. Accurate ENM predictions, however, require the initial contact topology to be maintained during a protein’s motion. This is naturally fulfilled for highly collective motions resulting in successful predictions. But localized functional transitions involving substantial changes in the contact topology are often poorly explained. This limits the practical relevance of ENMs because the motion type of a protein is unknown a priori and thus it is unknown whether ENMs can capture it.
lmcENM overcomes this limitation by leveraging information about the dynamic behavior of contacts, i.e. whether they break or are maintained when the protein moves. The maintained contacts remain after predicted breaking contacts have been removed from the initial network. In contrast to existing ENM variants, lmcENM is able to accurately predict protein motions even for localized and uncorrelated functional transitions with changing contact topology.
In the first part of my thesis, I show that the absence of observed breaking contacts enables ENMs to accurately explain localized functional transitions. The resulting network of observed maintained contacts, mcENM, can be built when start and end conformation of a functional transition are known. Of course, to apply this strategy in the standard case when only a single protein conformation is available, we need to be able to predict these breaking contacts.
In the second part of my thesis, I show how the breaking contacts can be predicted. To do so, I developed a machine-learning based classifier to differentiate breaking from maintained contacts based on a graph-based encoding of their structural context. The physicochemical characteristics of a contact’s structural context capture how tightly different parts of the protein are bound to each other, how this affects their movements, and ultimately their contact topology. To build lmcENM the predicted breaking contacts are removed from the initial network. Using a large set of proteins covering different motion types I demonstrate the effectiveness of lmcENM.
My thesis unlocks breaking contacts, or generally dynamic contact changes, as a novel source of information that has proven valuable in coarse-grained prediction of protein motion. Because they are defined on a simplified model of the structural connectivity of a protein, they are insensitive to structural details that would otherwise make their identification and prediction more difficult. The existence and usefulness of breaking contacts demonstrated in my thesis enables future research opportunities to study the conditions under which they occur and to examine the features that contributed the most to their accurate prediction. Our framework for predicting breaking contacts can be easily extended to further advance our understanding of protein motion.