Coevolution between amino acid residues and its context-dependence are important for exploring protein structure and function, and critical for understanding protein structural and functional evolution. Coevolution has long been ignored because of its complexity and the lack of computing power.
In the research presented here, I developed an efficient coevolution analysis methodology based on likelihood comparisons of statistical models. Likelihood ratios and Bayes factors, calculated using the Markov chain Monte Carlo algorithm, were employed as the statistics. Two types of models, 2-state and 3-state, were developed to allow for the context-dependence of coevolution. Computer programs implementing this methodology were coded in C/C++ and were run on the Beowulf clusters of our laboratory and the super computers of LSU. Using these programs and custom Perl scripts, residue coevolution in cytochrome c oxidase and photolyases/cryptochromes protein superfamily was analyzed.
I found that pairwise coevolution between residues is highly dependent on protein tertiary structures and functions. I detected extensive coevolving pairs in all our analyses, and these pairs were primary localized in regions of known structural and/or functional importance. I also found that coevolution is related to evolutionary rate and concentrated in moderately conserved sites. In supporting the importance of functional constraints, I detected a non-negligible coevolutionary signal between complex subunits and stronger coevolution in proteins of functional importance. I also found that the interaction between subunits can serve as a local coevolutionary constraint on one subunit rather than driving coevolution between two subunits. Based on coevolutionary patterns, I suggested that a domain without any previously supposed function actually operates as a folding core in the proteins of photolyase/cryptochrome superfamily. The coevolutionary patterns also provided clues regarding the functional evolution of electron transfer in this superfamily. I also found that coevolving sites with double substitutions along a branch tend to occur only at physically contacting sites, and that salt-bridge stabilization and secondary structure stabilization are important forces of residue coevolution.
The methodology and programs developed in this research are powerful tools for coevolutionary analysis, which can provide valuable information for characterization of protein structural/functional domains and exploration of protein structural/functional evolution.