Many protein systems are elusive to structure analysis with established methods. This project aims to develop novel methods for protein structure determination to target this problem class of proteins. The proposed method is based on high-density cross-link/mass spectrometry (CLMS) data and custom-tailored computational algorithms to interpret them. Specifically, this project targets three critical and interdependent endeavors for advancing cross-linking for structure determination: 1) Increasing the density of CLMS data, 2) improving the distribution of CLMS data, and 3) combining high-density CLMS data with customized conformational space search algorithms.
To improve the data density of CLMS data, we will test and evaluate different fragmentation methods in combination with high-density cross-linking reagents, such as the photoactivatable diazirine-based cross-linker sulfo-SDA. This requires adjustments in mass spectrometric measurements and settings, as well further developments in the computational interpretation of peptide and fragmentation spectra. We further propose a graph-based analysis to incorporate corroborate information between cross-links to boost the accuracy and density of the links.
To improve the distribution of CLMS data, we target a current limitation of trypsin-based digestion protocols. Trypsin-based digestion might create peptides that are too small or too large for mass spectrometric analysis because of uneven trypsin cleavage site distribution of the target protein. We will test alternative proteases that have cleavage sites to trypsin or unspecific cleavage sites. We aim to develop multi-digestion protocols to ensure even distribution of links over the sequence. This includes the modification of mass spectrometric acquisition protocols that are currently optimized for peptide detection of trypsin digested proteins.
Furthermore, we will develop custom-tailored computational methods to leverage the cross-linking data in structure modeling. To accomplish this, we will develop algorithms that are able to retrieve structure information from protein structure databases using the CLMS data. In addition, we will develop noise-robust structure modeling algorithms that compensate for the noisy nature of high-density CLMS data. This will be accomplished by a conformational space search algorithm that automatically updates its belief of the used cross-links during. This algorithm will reject noise in CLMS data and therefore improve the quality of the resulting structure models.
This project will rigorously evaluate the proposed method by a blind test in the context of the 13th community-wide Critical Assessment of protein Structure Prediction (CASP) experiment. We will cross-link proteins with unknown structure and test our computational algorithms using this data. In addition, the CLMS data will be disseminated in CASP to other structure prediction groups to maximize the impact of the proposed method.