The DNA molecule carries information in the form of a sequence of four nucleotide bases, adenine (A), cytosine (C), guanine (G) and thymine (T), which can be thought of as the letters of the genomic language. Short sequences of the letters form ‘DNA words’ that determine when and where proteins are made in the body.
Almost all of the cells in the human body contain the letters in precisely the same order. Different genes are however active (expressed) in different cell types, allowing the cells to function in their specialised roles, for example as a brain cell or a muscle cell. The key to this gene regulation lies in specialised DNA-binding proteins — transcription factors — that bind to the sequences and activate or repress gene activity.
The DNA letter C exists in two forms, cytosine and methylcytosine, which can be thought of as the same letter with and without an accent (C and Ç). Methylation of DNA bases is a type of epigenetic modification, a biochemical change in the genome that does not alter the DNA sequence. The two variants of C have no effect on the kind of proteins that can be made, but they can have a major influence on when and where the proteins are produced. Previous research has shown that genomic regions where C is methylated are commonly inactive, and that many transcription factors are unable to bind to sequences that contain the methylated Ç.
By analysing hundreds of different human transcription factors, researchers at Karolinska Institutet in Sweden have now found that certain transcription factors actually prefer the methylated Ç. These include transcription factors that are important in embryonic development, and for the development of prostate and colorectal cancers.
“The results suggest that such ‘master’ regulatory factors could activate regions of the genome that are normally inactive, leading to the formation of organs during development, or the initiation of pathological changes in cells that lead to diseases such as cancer,” says Professor Jussi Taipale at Karolinska Institutet’s Department of Medical Biochemistry and Biophysics who led the research.
The results pave the way for cracking the genetic code that controls the expression of genes, and will have broad implications for the understanding of development and disease. The availability of genomic information relevant to disease is expanding at an exponentially increasing rate.
“This study identifies how the modification of the DNA structure affects the binding of transcription factors, and this increases our understanding of how genes are regulated in cells and further aids us in deciphering the grammar written into DNA,” says Professor Taipale.