Understanding protein-protein interactions is vital to the study of cellular signaling pathways and cell regulation. Interaction networks mediate significant cellular processes from replication to programmed cell death, necessitating an analysis of the key proteins involved in these pathways. Identifying novel protein-protein interaction zones is the first step of many to identifying new interaction partners in the complex network of intracellular signaling. The goal of this project is to develop feature selection methods for identifying salient geometric and chemical properties of protein interaction zones and then utilize these features to distinguish between interaction and non-interaction zones via machine learning approaches such as support vector machines (SVMs) and k-nearest neighbors (kNN).
Given (1) a crystallographic/NMR protein structure, (2) a set of known protein-protein interaction zones as partial protein structures (positive control), and (3) a set of partial protein structures known not be interaction zones (negative control), can we predict novel protein-protein interaction zones?
Understanding protein-protein interactions is vital to the study of cellular signaling pathways and cell regulation. Interaction networks mediate significant cellular processes from replication to programmed cell death, necessitating an analysis of the key proteins involved in these pathways. Identifying novel protein-protein interaction zones is the first step of many to identifying new interaction partners in the complex network of intracellular signaling.
The goal of this project is to develop feature selection methods for identifying salient geometric and chemical properties of protein interaction zones and then utilize these features to distinguish between interaction and non-interaction zones via machine learning approaches such as support vector machines (SVMs) and k-nearest neighbors (kNN).
Because proteins are inherently complex molecules, the space of possible protein features is extremely large. Selecting salient features that can be used as markers for predicting the interaction capacity of protein surface regions is crucial to the development of sensitive and specific classifiers. Possible features of interest include and are by no means limited to:
These features just scrape the surface of the information available from just the crystallographic structure file (PDB file: www.pdb.org). Many other features such as evolutionary conservation and sequence homology are interesting and valuable as well, but outside of the scope of this project.
While there are many diverse types of proteins capable of having protein-protein interaction domains, protein kinases will be the focus of this work. Kinases are interesting because they are capable of "selective promiscuity". These kinases interact with multiple partners, but with extremely high specificity for only those partners. Understanding the structural basis for this "selective promiscuity" is a goal of this work.
An SH3 domain-mediated interaction network identified from peptide array target screen.(take from website of Shawn Li, University of Western Schulich)
I hypothesize that the identification of protein interfaces will not rely on only the existence of a handful of crucial features, but rather a collection of these features and their spacial distribution. Benchmarking against alternative approaches for protein interface prediction, such as ProMate (Neuvirth et al. 2004; ProMate web interface) will provide a valuable benchmark for sensitivity and specificity.
| February 1st-15th | Finalize selection of protein features to investigate |
| February 16th-29th | Select families of proteins to include in the study |
| March 1st-15th | Rate classification power of SVMs on the dataset with and without kernels as appropriate. Assess ability of Principal Component Analysis (to reduce dimensionality) in combination with k-nearest neighbors techniques for classification. |
| March 16th-30th | Benchmark sensitivity and specificity of model to existing alternative techniques |
| April | Write final report |
email: drew.h.bryant/AT/gmail.com