The 16S Ribosomal RNA Gene

Before I started reading about microbiome research, I had never heard of the 16S Ribosomal RNA (rRNA) gene. Over the last month it's become my new best friend. So I want to spend this post talking about what it is, why it's important and how it turned the classification of life on its head.

Ribosomes are present in every living cell, from bacteria to plants and animals. They're the bits of cellular machinery that read messenger RNA and turn it into proteins during protein synthesis. Each ribosome is made up of three or four subunits, which are composed of proteins and RNA molecules. The genes that code for the rRNAs are special for two reasons:

They're present in every living cell.
They act as molecular clocks.

The idea of a molecular clock was first proposed by Emile Zuckerkandl and Linus Pauling in relation to the structure of haemoglobin. They thought that differences in the protein sequence (and by extension the DNA sequence), could be used to estimate the time since two species diverged during the process of evolution (1)⁠. Haemoglobin was brilliant as a proof of concept since the evolutionary tree matched the one that had been drawn up using mammal anatomy. However, haemoglobin isn't present in every living cell, so we can't use it to analyse the relationship between different species of bacteria. Groups of biologists began using different molecules to discover the relationships between different species. Their results completely changed the way we classify bacteria and other microorganisms.

Bacteria used to be classified by their appearance under the microscope and various biochemical tests. This seemed like a pretty logical system. Obviously bacteria which can photosynthesise are different from the others, just as plants are different to animals. These bacteria are round, so we'll call them 'cocci' (hence Staphylococci, Streptococci, etc), and they're different from these spiral ones which we'll call Spirochaetes. It worked that way for plants and animals, so why not bacteria? In 1977, Woese and Fox published a paper suggesting that, based on their observations of the structure of rRNA, the previous division of life into nucleated and non-nucleated cells was inherently superficial. They proposed that life was actually grouped into three ‘urkingdoms’ (now called domains):

Eubacteria - made up of typical bacteria
Archaebacteria - the methanogenic bacteria
Eukaryotes - organisms with nucleated cells.

They argued that the sequences of ribosomes from archaebacteria were as distinct from eubacteria as eubacteria were from eukaryotes, so earning their own urkingdom (2)⁠. They also found that divisions within Eubacteria weren't what they seemed. For example, while some old divisions like Gram-positive and Gram-negative bacteria held true, analysis of rRNAs showed that photosynthetic bacteria were not one group but belonged to different families of bacteria (3). The idea was not widely accepted initially, but Woese continued to work using rRNA. He published “Bacterial Evolution” in 1987 in which he describes the ideal characteristics of a molecular clock, and how ribosomal RNA genes (also referred to as rDNA) fit these characteristics. According to Woese, a molecular clock must:

Exhibit clock-like behaviour – a genetic sequence which accumulates sequence changes at a constant rate (see Figure 1). Most importantly, these changes must not be selected for by evolution.
Have adequate range – change slowly enough to span hundreds of thousands of years of evolution.
Number of “Domains” – Domains are regions which are evolutionary independent of each other. Non random change in one domain will not affect others, so if one domain is affected by changes which are selected for by evolution, other domains can be analysed instead (3)⁠.

Figure 1: A schematic of a molecular clock. Sequence A is the common ancestor of sequences B, C, D and E. A base pair in the sequence randomly changes every 10 million years, so we can deduce that sequences D and E diverged from each other about 20 million years ago. If we suppose that sequence C has another variant, F (not shown), we would be able to see that F is more closely related to sequence E than sequence D.

Olsen et al. identified 6 features of rRNA genes that make them suitable for use as molecular clocks:

Since rRNAs are essential parts of protein synthesis, they are present in all organisms.
The structure of rRNAs is highly conserved and they are therefore easily identified.
The DNA sequence is formed of hypervariable regions flanked by conserved regions. The conserved regions act as targets, and the hypervariable regions are the parts we're actually interested in.
Each cell contains a large number of rRNA molecules, so they are easy to recover.
Sequences are long enough to allow stastistically significant comparisons.
rRNA genes are not transferred laterally between bacteria, so they represent true evolutionary relationships between organisms (4)⁠.

Figure 2: A schematic of the 16S rRNA gene. The gene is formed of 9 hypervariable regions (V1 - V9) flanked by conserved regions. These conserved regions can be used as targets to sequence the hypervariable regions which are then used in microbiome studies.

Woese also noted that different positions on rRNA genes change at different rates, allowing a full range of the evolutionary spectrum to be observed using one molecule (3)⁠.

There are 3 rRNA genes common to all organisms to choose from, 5S (about 120 nucleotides long), 16S (roughly 1500 nucleotides), and 23S (about 3000 nucleotides). Initially, 5S was used because its short length made it easier to sequence. As sequencing technology advanced, partial 16S sequences were used. Currently, the 16S rRNA gene is the standard phylogenetic marker used to identify and classify organisms in microbiome studies.

Not only are rRNA genes good for working out relationships between different organisms, but since each species has its own unique rRNA gene sequences, we can use it almost like a fingerprint. So, if I take a sample, a swab from someone's nose, if I can extract the DNA and somehow sequence all of the rRNA genes, I should be able to tell which bacteria are present. And presumably if 40% of my sequences are from Staphylococcus aureus, then 40% of the bacteria living in that person's nose must be S. aureus too right? Well... unfortunately it's not quite that easy.

References:

Zuckerkandl E, Pauling L. Molecular Disease, Evolution, and Genic Heterogeneity. Horizons in Biochemistry. 1962. p. 189–222. (PDF)
Woese CR, Fox GE. Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci U S A. 1977;74(11):5088–90. (PDF)
Woese CR. Bacterial evolution. Microbiol Rev. 1987;51(2):221–71. (PDF)
Olsen GJ, Lane DL, Giovannoni SJ, Pace NR. Microbial ecology and evolution: a ribosomal RNA approach. Annu Rev Microbiol. 1986;40:337–65.

Sunday, 6 November 2016

The 16S Ribosomal RNA Gene

No comments:

Post a Comment