Robotics and Biology Laboratory

Identifying near-native multi-fragment sequence alignments in protein structure prediction

Motivation

Up until today, commonly used fragment libraries only contain relatively small, independent fragments. Consequently, these libraries can only model the (sequentially) local context, but can’t model structurally conserved regions that are sequentially discontiguous. We therefore developed a library of so called ”building blocks”. A building block is a set of structurally contiguous, sequentially discontiguous fragments found in two or more proteins.
Existing methods of scoring a sequence alignment can only score each fragment independently or as a contiguous sequence (including the in-between parts). They are therefore not leveraging the additional information provided by building blocks optimally.

Description of Work

We explore how the knowledge about the dependency between building block fragments can be exploited for a more specific scoring scheme. In orderto achieve this, we examine a number of different features that allow a coarse distinction between structural matches and false positives. Ultimately, we evaluate the combined discriminative power of these features through the lens of three different machine learning algorithms.

Results

Judging from the performance on CASP9 targets, the proposed setup works well for template based modeling targets. We were able to cover 61 targets (15 more than the control) with 100 % near-native building block matches. Both the proposed setup and the control achieved roughly the same number of residues that were covered with near-native matches.