Saturday 26 November 2016

PCR Artefacts

PCR artefacts are DNA sequences produced by errors in the PCR process. Some only involve changing one base pair, but some are severe enough to increase the diversity in subsequent analysis of the amplified DNA. It's important to take steps to minimise these errors and identify sequences that might be the result of PCR errors rather than actual bacteria.

Errors in Replication

The amplification of DNA is a laboratory imitation of DNA replication, and so is subject to the same errors. Since DNA polymerase is not 100% accurate, point mutations (where the incorrect base is added) and deletions can occur which will alter the replicated sequence from its original template. This error will then be amplified and may appear in results as a different sequence, especially when using techniques which can identify single nucleotide differences between sequences. The observed error rate for Taq DNA polymerase during PCR depends on the reaction conditions, and varies between one error per 290 nucleotides to one error per 5411 nucleotides (1)⁠. The error rate also differs between DNA polymerases, for example using Pfu instead of Taq DNA polymerase leads to a 10-fold improvement in the error rate (2)⁠. Reading errors will also increase with the number of PCR cycles, so it is worth keeping the cycle number to a minimum.

PCR is Blind

When someone describes PCR and how it works, it sounds like a very orderly affair. The DNA strands nicely denature, the primers form a queue and bind, replication occurs and then the corresponding DNA strands join back together again, ready for the next cycle. This makes for a nice explanation, but of course it's not like that in reality. In reality it's a messy, messy process. There's a whole load of molecules bumping around in your reaction vessel. DNA spends most of its time in a double stranded form. Double stranded DNA is a very stable structure and molecules like to be stable. A single stranded DNA molecule is the neediest molecule you'll ever find, it just wants stability! Surely this is something we can all relate to. If the perfect complementary strand isn't nearby a similar sequence will do. Equally, your PCR machine won't wait patiently for all of the DNA polymerases to finish replicating their strands of DNA. If they're not done by the time the temperature changes, then it's tough titties and you end up with a partial sequence floating around. This can lead to some really funny DNA sequences cropping up when you sequence everything.

Let's imagine we have two 16S rRNA genes in our PCR, A and B:


The formation of heteroduplexes during PCR presents a problem. Heteroduplexes are double-stranded DNA molecules formed of single strands from different sources. As PCR progresses, you get more and more template DNA, but there are the same number of primers. The primer:template ratio decreases and can reach a point where primer annealing is no longer favoured. As we said earlier, DNA loves to bind to other bits of DNA. If there's not the perfect match nearby, either the complementary strand or a primer, it'll take what it can get. This leads to hybridisation of heterologous (from a different organism) template DNA and the formation of heteroduplexes. Heteroduplexes can increase the number of bands if the sampled is analysed using DGGE or TGGE, and introduce biases during the construction of clone libraries (3)⁠. Various methods for reducing heteroduplex formation have been proposed. These include the addition of more Taq polymerase after the 27th cycle, limiting the cycle number (4)⁠ and 10-fold dilution of the PCR product followed by three cycles of re-amplification (3)⁠.


Chimeras are more troublesome artefacts. They occur when a partial 16S rDNA fragment from the extension phase binds to a heterologous fragment during the annealing phase to form a heteroduplex. The incomplete fragment then acts as a primer for extension, creating a chimera of two 16S rRNA genes from different species. This is then amplified and can be easily confused as originating from a new, but unfortunately non-existent, bacterial species. Amplification of 16S rDNA is prone to the formation of chimeras because of the conserved regions of the gene. PCR amplification of 16S rDNA can produce between 5.4 and 8.6% chimeras (5)⁠. 

The frequency of chimera formation increases with:
  • The availability of partial rDNA fragments (6)⁠.
  • Damage to DNA by restriction enzymes, UV irradiation, sonication, depurination and rigorous cell lysis (7, 8).
  • The percentage similarity between DNA templates (9).
The incidence of chimera formation can be decreased by:
  • Increasing the elongation time (9, 10, 11).
  • Decreasing the number of cycles and thereby limiting the opportunity for formation and amplification of chimeras (5, 6)⁠.
Chimeras can also be identified after sequencing of amplified rDNA. Chimeric sequences are difficult to distinguish from true biological sequences, however, there are several computer programs which search for and identify chimeric sequences. Chimeras can also be identified by the production of incongruant trees following phylogenetic analyses on opposite ends of the rDNA sequences (6)⁠.

PCR artefacts are not the only problems with samples amplified using PCR. Not all DNA strands in the sample are amplified to the same extent, this is called differential amplification and is covered in another post.

  1. Eckert KA, Kunkel TA. DNA polymerase fidelity and the polymerase chain reaction. Genome Res. 1991;1(1):17–24.
  2. Lundberg KS, Shoemaker DD, Adams MWW, Short JM, Sorge JA, Mathur EJ. High-fidelity amplification using a thermostable DNA polymerase isolated from Pyrococcus furiosus. Gene. 1991;108(1):1–6.
  3. Thompson JR, Marcelino L a, Polz MF. Heteroduplexes in mixed-template amplifications: formation, consequence and elimination by “reconditioning PCR”. Nucleic Acids Res. 2002;30(9):2083–8. 
  4. Michu E, Mráčková M, Vyskot B, Žlůvová J. Reduction of heteroduplex formation in PCR amplification. Biol Plant. 2010;54(1):173–6.
  5. Wintzingerode F, Göbel UB, Stackebrandt E. Determination of microbial diversity in environmental samples: pitfalls of PCR-based analysis. FEMS Microbiol Rev. 1997;21:213–29.
  6. Osborn M A, Smith CJ. Molecular Microbial Ecology. Vol. 51. 2009. 370 p.
  7. Pääbo S., Irwin D. M. Wilson A. C. DNA damage promotes jumping between templates during enzymatic amplification. J. Biol. Chem. 1990;265:4721.
  8. Possemiers S, Verthé K, Uyttendaele S, Verstraete W. PCR-DGGE-based quantification of stability    of the microbial community in a simulator of the human intestinal microbial ecosystem. FEMS Microbiol Ecol. 2004;49(3):495–507.
  9. Wang GC, Wang Y. The frequency of chimeric molecules as a consequence of PCR co-amplification of 16S rRNA genes from different bacterial species. Microbiology. 1996 May;142 (Pt 5):1107-14.
  10. Shen J, Zhang B, Wei G, Pang X, Wei H, Li M, et al. Molecular profiling of the Clostridium leptum subgroup in human fecal microflora by PCR-denaturing gradient gel electrophoresis and clone library analysis. Appl Environ Microbiol. 2006;72(8):5232–8.
  11. Tourlomousis P, Kemsley EK, Ridgway KP, Toscano MJ, Humphrey TJ, Narbad A. PCR-Denaturing gradient gel electrophoresis of complex microbial communities: A two-step approach to address the effect of gel-to-gel variation and allow valid comparisons across a large dataset. Microb Ecol. 2010;59(4):776–86.

Friday 18 November 2016

The Polymerase Chain Reaction

Kary Mullis has an interesting section in his Wikipedia entry under "Extraterrestrial life". Apparently Mullis "reported an encounter with a glowing green raccoon at his cabin in the woods of northern California around midnight one night in 1985. He denies the involvement of LSD in this encounter." However, he does attribute his LSD use to his pioneering the Polymerase Chain Reaction. (PCR).

There were methods of synthesising small amounts of single-stranded DNA using DNA polymerase in laboratories before 1983. However, Mullis’ use of two oligonucleotide (short DNA molecules) primers, which target opposite strands of DNA to amplify a specific region, allowed for targeted and repetitive synthesis. Further modifications to the original procedure have converted PCR into an essential tool in a wide variety of fields, including diagnosis of infectious and genetic disease, as well as microbial ecology (1)⁠. Before the advent of PCR, DNA extraction from environmental samples could not yield enough 16S rRNA genes for a population analysis. Studies relied on the ability to culture bacteria, and then use 16S rRNA genes to identify the bacteria. PCR offered a way, once the correct primers had been identified, to amplify the 16S rRNA genes of all bacterial species in a sample to a level where they could be sequenced. Although PCR is not a perfect method its ability to amplify DNA signatures from uncultured organisms has added a whole new dimension to the analysis of microbial communities.

I'm in the middle of a chain reaction...

In any PCR, the reaction vessel must contain the following ingredients:
  • Template DNA - The DNA extracted from the sample.
  • A forward and reverse oligonucleotide primer pair - These are short strands of DNA. They will bind to any DNA which has the complementary sequence. PCR always uses two primers, a forward and a reverse (Figure 1). A paper using PCR primers will state the sequence of the primers they used and which gene they are designed to target.
Figure 1 - The primers are designed to flank the target DNA sequence, this allows for targeted replication.
  • A thermostable DNA polymerase - This is the enzyme which binds to the primers and attaches free nucleotides in the sequence determined by the template DNA.
  • Free deoxynucleotide triphosphates (dNTPs) - All 4 nucleotides (dATP, dCTP, dGTP, dTTP) that form DNA will need to be added.
  • Mg2+ ions - Some DNA polymerases need Magnesium ions to function. The concentration needs to be carefully calculated and calibrated (2)⁠.
The reaction vessel completes timed cycles of different temperatures at which different stages of DNA amplification take place. Each cycle has 3 stages:

  1. Denaturing of double-stranded DNA to form single-strands at 94 – 96°C.
  2. Annealing of primers (forward to the sense strand, reverse to the antisense strand) at a variable temperature.
  3. Extension of the DNA from the primers by DNA polymerase at 68 – 78°C (2)⁠.
Figure 2 - The PCR has 3 stages to each cycle: denaturing, annealing and replication.

The oligonucleotide primers are the most controllable variable of any PCR. They can also be the source of bias. Even if Firmicutes are the most abundant phyla in the environmental sample they will not be found on subsequent analysis if the primers don’t successfully bind to Firmicute DNA. As such, primer design is a major component of any microbial community study.

Primer Design

As mentioned before, the 16S rRNA gene is present in all organisms from the three domains: Archaea, Bacteria and Eukaryota. The nucleotide sequence can be divided into 9 hypervariable regions (termed V1 V9) flanked by conserved regions (3)⁠. Primers that target the 16S rRNA gene are designed to bind to the conserved regions, allowing for replication of the hypervaribale regions which can then be analysed depending on the goals of the study. A wide variety of primers can be chosen depending on whether researchers wish to amplify the entire gene, or just specific hypervariable regions.

There is no established nomenclature for primers but two systems are prevalent, a short and a long name. The long name uses the system proposed by Alm et al. in which each primer is named via 7 key features separated by hyphens, e.g. S-D-Bact-0008-a-S-20 (4)⁠. The short name consists of a number corresponding to the first base of the target sequence on the 16S rRNA gene, followed by either F (for forward) or R (for reverse). For example, the universal primers that amplify the entire 16S rRNA gene are 8F and 1492R (5-8)⁠. Short names may be followed by a -GC, which shows the presence of a GC-clamp used for denaturing and temperature gradient gel electrophoresis. The numbering system in both systems is based on the corresponding base from the 16S rRNA gene of Escherichia coli unless otherwise specified (4,9)⁠. The hypervariable regions have been identified as spanning (approximately) the following bases: 

V1 : 69 – 99                    V4 : 576 – 682             V7 : 1117 – 1173

V2 : 137 – 242                V5 : 822 – 879             V8 : 1243 – 1294

V3 : 433 – 497                V6 : 986 – 1043           V9 : 1435 – 1465

This can be used to identify which primers target which hypervariable region, for example 986F and 1401R will amplify the V6, V7 and V8 regions (10)⁠.

Let's have a look at a research paper that uses PCR. Here's the section about primers:

"The V3 region of the 16S rRNA genes (position 339–539 in the Escherichia coli gene) of bacteria was amplified using primers HDA1-GC (5'-CGC CCG GGG CGC GCC CCG GGC GGG GCG GGG GCA CGG GGG GAC TCC TAC GGG AGG CAG CAG T-3'; the GC-clamp is in boldface) and HDA2 (5'-GTA TTA CCG CGG CTG CTG GCA C-3')"

We can ignore the GC-clamp for now. So we have two primers and their sequences, but what does this actually correspond to? We can get the actual rRNA gene sequences for bacteria from the SILVA database and have a look to check if this sequence actually exists in real life, so below are the sequences for 2 species of bacteria from different phyla - Lactobacillus casei (a Firmicute) and everyone's favourite Escherichia coli (from Gammaproteobacteria).



So, we've found our Forward Primer, followed by the DNA we're going to replicate, note the differences between L. casei and E. coli which will help with identifying species and building phylogenetic trees. Finally there's the Reverse Primer. Now, remember DNA is double stranded, so what you see above is half the picture as it's the sequence of one strand. The Reverse Primer binds to the other strand of DNA, so what's highlighted is the complementary sequence for what's actually listed in the paper.

The best primers to use will be defined by the aims and methods used in the experiment. Typically, techniques for fingerprinting microbial communities like DGGE and terminal restriction fragment length polymorphism (T-RFLP) use DNA fragments of no more than 500 base pairs (11)⁠. Modern sequencing techniques are also limited by read length with higher throughput techniques often producing lower read lengths. 454 pyrosequencing produces read lengths of up to 700 base pairs, while Illumina sequencing only provides reads of 300 (although this system has the advantage of being able to sequence DNA at both ends of a strand). Since the 16S rRNA gene is roughly 1500 base pairs long, we must select which variable region to use in our analysis. Not all variable regions are of equal merit when identifying different bacterial genera and species, for example it has been reported that V2 was unable to distinguish between common Staphylococcal and Streptococcal pathogens, but provided the best region when analysing differences between Mycobacterial species (10)⁠. The same study concluded that regions V2 and V6 have the maximum nucleotide heterogeneity and therefore are the best regions for discriminating between a selection of 110 bacterial species (10)⁠.

When designing primers, there are two viable options. It is possible to review the literature and reuse primers which have been previously validated. However, it’s important to consider whether or not these primers are still valid, as new additions to 16S rRNA databases may reveal that previous primers are not as specific as once thought. Another option is using software such as ARD or PRIMROSE to design primers according to sequences for bacteria which appear in online databases of 16S rRNA gene sequences (2)⁠.


  1. Bartlett JMS, Stirling D. A short history of the polymerase chain reaction. Methods Mol Biol. 2003;226:3–6. 

  2. Osborn M a, Smith CJ. Molecular Microbial Ecology. Vol. 51. 2009. 370 p

  3. Van de Peer Y, Chapelle S, De Wachter R. A quantitative map of nucleotide substitution rates in bacterial rRNA. Nucleic Acids Res. 1996;24(17):3381–91. 

  4. Alm EW, Oerther DB, Larsen N, Stahl D a, Raskin L, Alm EW, et al. The oligonucleotide probe database . The Oligonucleotide Probe Database. 1996;62(10):3557–9. 

  5. Gong J, Forster RJ, Yu H, Chambers JR, Wheatcroft R, Sabour PM, et al. Molecular analysis of bacterial populations in the ileum of broiler chickens and comparison with bacteria in the cecum. 2002;1376:1–9. 

  6. Gong J, Forster RJ, Yu H, Chambers JR, Sabour PM, Wheatcroft R, et al. Diversity and phylogenetic analysis of bacteria in the mucosa of chicken ceca and comparison with bacteria in the cecal lumen. FEMS Microbiol Lett. 2002;208(1):1–7. 

  7. Lu J, Idris U, Harmon B, Maurer JJ, Lee MD, Hofacre C. Diversity and Succession of the Intestinal Bacterial Community of the Maturing Broiler Chicken Diversity and Succession of the Intestinal Bacterial Community of the Maturing Broiler Chicken. Appl Environ Microbiol. 2003;69(11):6816–24. 

  8. Gong J, Si W, Forster RJ, Huang R, Yu H, Yin Y, et al. 16S rRNA gene-based analysis of mucosa-associated bacterial community and phylogeny in the chicken gastrointestinal tracts: From crops to ceca. FEMS Microbiol Ecol. 2007;59(1):147–57. 

  9. Brosius J, Palmer ML, Kennedy PJ, Noller HF. Complete nucleotide sequence of a 16S ribosomal RNA gene from Escherichia coli. Proc Natl Acad Sci USA. 1978;75(10):4801–5.

  10. Chakravorty S, Helb D, Burday M, Connell N. A detailed analysis of 16S ribosomal RNA gene segments for the diagnosis of pathogenic bacteria. J Microbiol Methods. 2007;69(2):330–9. 

  11. Schmalenberger A, Schwieger F, Tebbe CC. Effect of Primers Hybridizing to Different Evolutionarily Conserved Regions of the Small-Subunit rRNA Gene in PCR-Based Microbial Community Analyses and Genetic Profiling. Appl Environ Microbiol. 2001;67(8):3557–63. 

Friday 11 November 2016

Sampling, Storing and Extracting DNA

The way in which DNA from a sample is handled and extracted from cells can introduce biases into the analysis of a microbial community. How it's done will differ depending on whether you're sampling from soil, ocean water, or faeces. I'm going to concentrate on sampling from gut contents and faeces.

Taking the sample

Working with chickens is far easier than working with people. They don't have the same social rules about a researcher waiting eagerly for them to produce a faecal sample. People tend to like to do their pooing in private, which adds an additional challenge to collecting samples. Person or poultry, the basic question remains the same: "How can I get this microbial community into a pot without contaminating it with the bacteria that are covering everything?"

There will be bacteria present in the air, on our hands and  our clothes so we need to consider how best to minimise these contaminants. The majority of studies I've read about the chicken intestinal microbiome don't give many details about how they took their samples. Some groups have removed the gastrointestinal tract, transferred them to individual sterile bags and kept them on ice until contents can be sampled in a cleaner environment such as a laboratory (1-4)⁠, but they don't give much more detail than that. 

A microbiome sequencing gut sample kit
Tony Webster from San Francisco, California, UBiome - Microbiome Sequencing Gut Bacteria Sample Kit (17238556660)CC BY 2.0
I don't think it's too much to imagine that you're a surgeon. During surgery, everything has to be as clean as possible, you, your equipment, your work surfaces. This is all to prevent bacterial contamination of your patient, so why shouldn't we extend the same to our samples? Sometimes we need to be even more careful, since even if you kill bacteria, although they can't grow, their DNA can still contribute to the sample. There was a good Reddit post dealing with this in the context of sterilising medical implants, but the general idea is the same.

When taking or handling samples, we can minimise contamination by using aseptic protocols. These include measures such as:
  • Wearing personal protective equipment (gowns, gloves, hair nets, face masks, etc.) to minimise contamination from skin, clothing and hair.
  • Using 70% ethanol to clean work benches and surfaces.
  • Working within a fume hood which can be treated with UV light between samples.
An example of personal protective equipment in a laboratory
Photo: U.S. Food and Drug Administration/Public Domain
The UV light goes back to what I was saying earlier about dead bacteria still being able to contribute DNA to samples. UV light damages and breaks down DNA, so it'll help to minimise this contamination. You can also soak any equipment you're going to reuse in 30% bleach for 30 minutes.

There are lots of ways to prevent contamination once the sample has got to the laboratory. If you want to read about them in more detail, there's a good link here about aseptic techniques.

Fresh Frozen Faeces

If DNA extraction can’t be carried out on fresh samples, then samples will have to be stored to stop damage to the DNA. The recommended method is direct freezing at -80°C (5)⁠. This doesn't affect the viability of the DNA, and you'll get similar results between fresh and frozen samples (6). There might actually be an advantage to freezing your samples. The ratio of Firmicutes to Bacteroides has been observed to change after sample freezing, with an increase in DNA from Firmicutes. The suspected cause is that freezing samples improves DNA extraction from gram-positive bacteria (7)⁠. However, this hasn't been replicated in all studies, so there's probably a variety of factors at play. There's even some information that suggests you can store faecal samples at room temperature for up to two weeks, and it won't significantly affect the bacteria that you find in the sample (8). Personally, I would reason that the faster you can extract the DNA the better, and if you can't extract the DNA it's probably better to freeze the sample just to be sure. 

If you can't beat them...

Now that we've got our sample, we've got to start thinking about how to analyse the DNA. At the moment it's locked up inside the bacteria and we can't get at it. So we need to pop (lyse) the bacteria. This sounds simple, but not all bacteria have the same anatomy. Some of them have tough cell walls (generally Gram-positive bacteria), and others are only protected by delicate cell membranes (generally Gram-negative bacteria).

To lyse bacteria, we can either use mechanical, chemical or enzymatic techniques. The success of chemical and enzymatic methods can vary greatly depending on the sample. The pH and other chemical conditions can affect the performance of chemical and enzymatic methods, so they're not brilliant for environmental samples (8)⁠. The most commonly used mechanical technique is beat beating, which does exactly what it says. The sample is vigorously shaken in the presence of small beads. The problem with this method is that the beads don't differentiate between cell walls, cell membranes and DNA. So once DNA has been released from the safety of its bacteria, it can be beaten up and broken down by the beads as well. This is called DNA shearing. The amount of DNA shearing depends on factors such as the time and intensity of beating; the size of the beads and the bead to sample ratio. It's important to keep DNA shearing to a minimum as small fragments of DNA can increase PCR artefacts (we'll come to that later!) (8)⁠. 

0.5mm Silica beads used in bead beating.
Photo by: Lilly_M, Zirconia-silica-bead, CC BY-SA 3.0

Luckily, there's a way round this. A repeated bead-beating (RBB) protocol has been developed which can help limit DNA shearing. First, you perform an initial round of bead beating. This lyses the more delicate Gram-negative bacteria. You can then remove the liquid portion (the lysate), which will contain the DNA from any lysed bacteria, before a second round of beat-beating takes places to lyse  the more resistant Gram-positive bacteria and archaea. This minimises the amount of damage done to DNA from fragile bacteria which would otherwise be subjected to extra beating (9)⁠. When compared to other methods, RBB has been shown to have a superior DNA extraction efficiency and provide the best bacterial diversity, especially with respect to archaea and Gram-positive bacteria (10,11)⁠.


After cell lysis, the DNA has to be extracted. A wide variety of commercial kits are available, and each can introduce its own bias in terms of bacterial diversity and quality of DNA extracted (11,12)⁠. This makes the comparison of studies examining the microbiome difficult, as many will have used different DNA extraction techniques. There's also lots of home brew recipes available, but no one has done any comparative research with these. Apajalahti et al. described a method for lysis and extraction of DNA from ileal and caecal samples with reported cell lysis rates of >95% and > 99% respectively (13)⁠.

The final step of DNA extraction is purification which aims to remove PCR-inhibiting substances such as Dnases, polysaccharides and proteases which can interfere with the amplification of DNA (14)⁠. PCR inhibitors are present in chicken faecal and caecal samples, with a higher level detected in caecal samples, and they've also been found in human faeces (15)⁠. Commercially available DNA extraction kits contain steps which will help remove PCR inhibitors from samples (16)⁠, however there are additional steps which can be taken. The addition of non-acetylated bovine serum albumin (BSA) has been shown to partially overcome this inhibition, while polyethylene glycol (PEG) also facilitates PCR of faecal samples (15)⁠. T4 gene 32 protein has also been reported to reduce the inhibitory effects of contaminants (17)⁠.

Once DNA extraction is complete, you can estimate the quality of extracted DNA using agarose gel electrophoresis (18)⁠ and quantity using a spectrophotometre such as Nanodrop⁠. This step is especially important as DNA concentrations of a few to tens of picograms can cause random changes in PCR efficiency (19)⁠.

Once you've extracted you're DNA you're ready for the next step in the process, amplifying the DNA using PCR.

1. Park SH, Lee SI, Ricke SC. Microbial populations in naked neck chicken ceca raised on pasture flock fed with commercial yeast cell wall prebiotics via an Illumina MiSeq platform. PLoS One. 2016;11(3):1–15. [PDF]
2. Ballou AL, Ali RA, Mendoza MA, Ellis JC, Hassan HM, Croom WJ, et al. Development of the Chick Microbiome: How Early Exposure Influences Future Microbial Diversity. Front Vet Sci 2016; 3:2.  [PDF]

3. Zhu XY, Zhong T, Pandya Y, Joerger RD. 16S rRNA-based analysis of microbiota from the cecum of broiler chickens. Appl Environ Microbiol. 2002;68(1):124–37. [PDF]
4. Amit-Romach E, Sklan D, Uni Z. Microflora Ecology of the Chicken Intestine Using 16S Ribosomal DNA Primers. Poult Sci. 2004;83:1093–8. [PDF]

5. Thomas V, Clark J, Doré J. Fecal microbiota analysis: an overview of sample collection methods and sequencing strategies. Future Microbiol. 2015;10(9):1485–504. [Abstract]

6. Fouhy F, Deane J, Rea MC, O’Sullivan Ó, Ross RP, O’Callaghan G, et al. The effects of freezing on faecal microbiota as determined using Miseq sequencing and culture-based investigations. PLoS One. 2015;10(3):1–13. [PDF]

7. Bahl MI, Bergström A, Licht TR. Freezing fecal samples prior to DNA extraction affects the Firmicutes to Bacteroidetes ratio determined by downstream quantitative PCR analysis. FEMS Microbiol Lett. 2012;329(2):193–7. [PDF]

8. Lauber CL, Zhou N, Gordon JI, Knight R, Fierer N. Effect of storage conditions on the assessment of bacterial community structure in soil and human-associated samples. FEMS Microbiol Lett. 2010;307(1):80–6. [PDF]

9. Stackebrandt E, Pukall R, Ulrichs G, Rheims H. Analysis of 16S rDNA clone libraries: part of the big picture. Proc 8th Int Symp Microb Ecol Microb Biosyst new Front Atl Canada Soc Microb Ecol Halifax, Nov Scotia, Canada. [PDF]

10. Osborn M a, Smith CJ. Molecular Microbial Ecology. Vol. 51. 2009. 370 p.

11.  Yu Z, Morrison M. Improved extraction of PCR-quality community DNA from digesta and fecal samples. Biotechniques. 2004;36(5):808–12. [PDF]

12. Salonen A, Nikkilä J, Jalanka-Tuovinen J, Immonen O, Rajilić-Stojanović M, Kekkonen RA, et al. Comparative analysis of fecal DNA extraction methods with phylogenetic microarray: Effective recovery of bacterial and archaeal DNA using mechanical cell lysis. J Microbiol Methods. 2010;81(2):127–34. [PDF]

13. Claassen S, du Toit E, Kaba M, Moodley C, Zar HJ, Nicol MP. A comparison of the efficiency of five different commercial DNA extraction kits for extraction of DNA from faecal samples. J Microbiol Methods. 2013;94(2):103–10. [PDF]

14. Mirsepasi H, Persson S, Struve C, Andersen LOB, Petersen AM, Krogfelt K. Microbial diversity in fecal samples depends on DNA extraction method: easyMag DNA extraction compared to QIAamp DNA stool mini kit extraction. BMC Res Notes. 2014; 7:50. [PDF]

15. Apajalahti JH, Särkilahti LK, Mäki BR, Heikkinen JP, Nurminen PH, Holben WE. Effective recovery of bacterial DNA and percent-guanine-plus-cytosine-based analysis of community structure in the gastrointestinal tract of broiler chickens. Appl Environ Microbiol. 1998;64(10):4084–8. [PDF]

16. Wilson IG. Inhibition and Facilitation of Nucleic Acid Amplification. Appl Environ Microbiol. 1997;63(10):3741–51. [PDF]

17. Rudi K, Høidal HK, Katla T, Johansen BK, Nordal J, Jakobsen KS. Direct Real-Time PCR Quantification of Campylobacter jejuni in Chicken Fecal and Cecal Samples by Integrated Cell Concentration and DNA Purification. Society. 2004;70(2):790–7. [PDF]

18. Grant W, Long P. Environmental microbiology; Methods and Protocols. 2014. 241 p.

19. Chandler DP, Fredrickson JK, Brockman FJ. Effect of PCR template concentration on the composition and distribution of total community 16S rDNA clone libraries. Mol Ecol. 1997;6(5):475–82. [PDF]

Sunday 6 November 2016

The 16S Ribosomal RNA Gene

Before I started reading about microbiome research, I had never heard of the 16S Ribosomal RNA (rRNA) gene. Over the last month it's become my new best friend. So I want to spend this post talking about what it is, why it's important and how it turned the classification of life on its head.

Ribosomes are present in every living cell, from bacteria to plants and animals. They're the bits of cellular machinery that read messenger RNA and turn it into proteins during protein synthesis. Each ribosome is made up of three or four subunits, which are composed of proteins and RNA molecules. The genes that code for the rRNAs are special for two reasons:

  1. They're present in every living cell.
  2. They act as molecular clocks.

The idea of a molecular clock was first proposed by Emile Zuckerkandl and Linus Pauling in relation to the structure of haemoglobin. They thought that differences in the protein sequence (and by extension the DNA sequence), could be used to estimate the time since two species diverged during the process of evolution (1)⁠. Haemoglobin was brilliant as a proof of concept since the evolutionary tree matched the one that had been drawn up using mammal anatomy. However, haemoglobin isn't present in every living cell, so we can't use it to analyse the relationship between different species of bacteria. Groups of biologists began using different molecules to discover the relationships between different species. Their results completely changed the way we classify bacteria and other microorganisms. 

Bacteria used to be classified by their appearance under the microscope and various biochemical tests. This seemed like a pretty logical system. Obviously bacteria which can photosynthesise are different from the others, just as plants are different to animals. These bacteria are round, so we'll call them 'cocci' (hence Staphylococci, Streptococci, etc), and they're different from these spiral ones which we'll call Spirochaetes. It worked that way for plants and animals, so why not bacteria? In 1977, Woese and Fox published a paper suggesting that, based on their observations of the structure of rRNA, the previous division of life into nucleated and non-nucleated cells was inherently superficial. They proposed that life was actually grouped into three ‘urkingdoms’ (now called domains): 

  • Eubacteria - made up of typical bacteria 
  • Archaebacteria - the methanogenic bacteria
  • Eukaryotes - organisms with nucleated cells. 

They argued that the sequences of ribosomes from archaebacteria were as distinct from eubacteria as eubacteria were from eukaryotes, so earning their own urkingdom (2)⁠. They also found that divisions within Eubacteria weren't what they seemed. For example, while some old divisions like Gram-positive and Gram-negative bacteria held true, analysis of rRNAs showed that photosynthetic bacteria were not one group but belonged to different families of bacteria (3). The idea was not widely accepted initially, but Woese continued to work using rRNA. He published “Bacterial Evolution” in 1987 in which he describes the ideal characteristics of a molecular clock, and how ribosomal RNA genes (also referred to as rDNA) fit these characteristics. According to Woese, a molecular clock must:

  1. Exhibit clock-like behaviour –  a genetic sequence which accumulates sequence changes at a constant rate (see Figure 1). Most importantly, these changes must not be selected for by evolution.
  2. Have adequate range – change slowly enough to span hundreds of thousands of years of evolution.
  3. Number of “Domains” – Domains are regions which are evolutionary independent of each other. Non random change in one domain will not affect others, so if one domain is affected by changes which are selected for by evolution, other domains can be analysed instead (3)⁠.

Figure 1: A schematic of a molecular clock. Sequence A is the common ancestor of sequences B, C, D and E. A base pair in the sequence randomly changes every 10 million years, so we can deduce that sequences D and E diverged from each other about 20 million years ago. If we suppose that sequence C has another variant, F (not shown), we would be able to see that F is more closely related to sequence E than sequence D. 

Olsen et al. identified 6 features of rRNA genes that make them suitable for use as molecular clocks:

  1. Since rRNAs are essential parts of protein synthesis, they are present in all organisms.
  2. The structure of rRNAs is highly conserved and they are therefore easily identified.
  3. The DNA sequence is formed of hypervariable regions flanked by conserved regions. The conserved regions act as targets, and the hypervariable regions are the parts we're actually interested in.
  4. Each cell contains a large number of rRNA molecules, so they are easy to recover.
  5. Sequences are long enough to allow stastistically significant comparisons.
  6. rRNA genes are not transferred laterally between bacteria, so they represent true evolutionary relationships between organisms (4)⁠. 

Figure 2: A schematic of the 16S rRNA gene. The gene is formed of 9 hypervariable regions (V1 - V9) flanked by conserved regions. These conserved regions can be used as targets to sequence the hypervariable regions which are then used in microbiome studies.

Woese also noted that different positions on rRNA genes change at different rates, allowing a full range of the evolutionary spectrum to be observed using one molecule (3)⁠.

There are 3 rRNA genes common to all organisms to choose from, 5S (about 120 nucleotides long), 16S (roughly 1500 nucleotides), and 23S (about 3000 nucleotides). Initially, 5S was used because its short length made it easier to sequence. As sequencing technology advanced, partial 16S sequences were used. Currently, the 16S rRNA gene is the standard phylogenetic marker used to identify and classify organisms in microbiome studies.

Not only are rRNA genes good for working out relationships between different organisms, but since each species has its own unique rRNA gene sequences, we can use it almost like a fingerprint. So, if I take a sample, a swab from someone's nose, if I can extract the DNA and somehow sequence all of the rRNA genes, I should be able to tell which bacteria are present. And presumably if 40% of my sequences are from Staphylococcus aureus, then 40% of the bacteria living in that person's nose must be S. aureus too right? Well... unfortunately it's not quite that easy.


  1. Zuckerkandl E, Pauling L. Molecular Disease, Evolution, and Genic Heterogeneity. Horizons in Biochemistry. 1962. p. 189–222. (PDF)
  2. Woese CR, Fox GE. Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci U S A. 1977;74(11):5088–90. (PDF)
  3. Woese CR. Bacterial evolution. Microbiol Rev. 1987;51(2):221–71. (PDF)
  4. Olsen GJ, Lane DL, Giovannoni SJ, Pace NR. Microbial ecology and evolution: a ribosomal RNA approach. Annu Rev Microbiol. 1986;40:337–65. 

About the blog

Microbiome is a new word. It was first used by Joshua Lederberg in 2001 to describe “the ecological community of commensal, synbiotic, and pathogenic microorganisms that literally share our body space...” (enthusiastic etymologists can click here). Since then, ‘microbiome’ has been used as an all encompassing term for the study of bacterial proteins (proteomics) and  genes (metagenomics) in the environment, as well as working out which species of bacteria make up the communities living in, on and around us.

Microbiome research is a completely new world for me as well. I trained as a veterinary surgeon in the UK. I spent a bit of time in a laboratory during my final year, mainly because it involved no on call shifts, weekend work or stressful situations. My project was loosely based around chicken intestinal muscous, Campylobacter and spotty liver disease. Apart from that, I always planned on working in clinics.

After graduating, I spent two years working as a vet in Colombia. I was working with the things that I liked best, infectious diseases and traumatic injuries... but I began to want something more. I started to look for PhD projects at universities. One caught my eye, studying the chicken intestinal microbiome to improve and inform the use of probiotics. I had used probiotics in practice, usually after blitzing a parvovirus puppy with antibiotics, but I had never seen any real evidence for their use in animals. I prescribed them more on a hunch than any informed clinical decision. When I thought about it this bothered me. On what basis was I using probiotics as medication? Were they doing any good? This doctoral project sounded right up my street. I applied and 6 months later I was at the university, sat at my desk, wondering where to start.

I launched myself straight into the research looking at the chicken intestinal microbiome. The more I read, the more I realised that I couldn't just read the Results and Discussion and expect to understand or interpret what was being said. I had never had any training in DNA sequencing or PCR. I was baffled by clone libraries, bemused by 454 pyrosequencing and befuddled by chimeras. It was time to take a step back and start right at the beginning. Everything you do, how you take samples, how they're stored, any and every procedure done to the sample before analysis will affect your results.

The aim of this blog is to look at these aspects of microbiome research. Perhaps not as thrilling as looking at the actual results, but equally important. A lot of the available information is very dry, and difficult to read. My aim is to read and summarise this information, presenting it in an accessible way with pictures and animations. I hope that this blog will act as a resource for people who want to truly understand the results coming out of microbiome research. Do you remember in maths classes, when your teacher would tell you that the answer isn't the most important part, but your working is essential? The same applies to scientific research.

I don't claim to yet be an expert in microbial research, but I hope this blog will track my progress and act as a forum for discussion throughout my research.