How do proteins bind to nucleotide sequences?

One of the canonical models for gene regulation involves a regulatory protein recognizing and binding preferentially to a particular sequence of DNA in the promoter region of a gene and thus increasing the affinity of RNA polymerase for that region. Camas et al (here) use the LacI family of transcriptional regulators (which have the helix-turn-helix domain) to search for correlations between the amino acid of transcription factors and the DNA sequences they regulate. Two findings stick out:

1) They found a consensus binding site across the family of LacI transcription regulators, which is here:

doi:10.1371/journal.pcbi.1000989

This is a promising indication that there is some sort of DNA sequence conservation among transcription factors. It is computationally expensive and statistically complex to search for these conserved sequences (and especially to do so combinatorially), so any current findings should in my mind be viewed as validations of more precise and useful findings in the future. (Perhaps I am overly optimistic!)

2) They found sequence correlations between amino acids 15 and 16 of the transcription factors and nucleotides 5 and 4 of their associated DNA binding sites. In particular, transcription factors with the same DNA-contacting amino acids tend to recognize highly similar (“degenerate”) nucleotide sequences:

doi:10.1371/journal.pcbi.1000989; "Recognition degeneracies are represented as unidirectional arrows (asymmetrical intrinsic), bidirectional divergent arrows (symmetrical intrinsic), and bidirectional convergent arrows (extrinsic). Colors for polar (green), basic (blue), acidic (red) and hydrophobic (black) amino acids.

Even though many of these studies are in bacteria, such regulatory systems play a large role in neural systems, as general regulatory mechanisms are conserved across the phylogenetic tree. It is interesting to see how all of these disciplines are intertwined.

Reference

Camas FM, Alm EJ, Poyatos JF (2010) Local Gene Regulation Details a Recognition Code within the LacI Transcriptional Factor Family. PLoS Comput Biol 6(11): e1000989. doi:10.1371/journal.pcbi.1000989