The Polymerase Chain Reaction

Kary Mullis has an interesting section in his Wikipedia entry under "Extraterrestrial life". Apparently Mullis "reported an encounter with a glowing green raccoon at his cabin in the woods of northern California around midnight one night in 1985. He denies the involvement of LSD in this encounter." However, he does attribute his LSD use to his pioneering the Polymerase Chain Reaction. (PCR).

There were methods of synthesising small amounts of single-stranded DNA using DNA polymerase in laboratories before 1983. However, Mullis’ use of two oligonucleotide (short DNA molecules) primers, which target opposite strands of DNA to amplify a specific region, allowed for targeted and repetitive synthesis. Further modifications to the original procedure have converted PCR into an essential tool in a wide variety of fields, including diagnosis of infectious and genetic disease, as well as microbial ecology (1)⁠. Before the advent of PCR, DNA extraction from environmental samples could not yield enough 16S rRNA genes for a population analysis. Studies relied on the ability to culture bacteria, and then use 16S rRNA genes to identify the bacteria. PCR offered a way, once the correct primers had been identified, to amplify the 16S rRNA genes of all bacterial species in a sample to a level where they could be sequenced. Although PCR is not a perfect method its ability to amplify DNA signatures from uncultured organisms has added a whole new dimension to the analysis of microbial communities.

I'm in the middle of a chain reaction...

In any PCR, the reaction vessel must contain the following ingredients:

Template DNA - The DNA extracted from the sample.

A forward and reverse oligonucleotide primer pair - These are short strands of DNA. They will bind to any DNA which has the complementary sequence. PCR always uses two primers, a forward and a reverse (Figure 1). A paper using PCR primers will state the sequence of the primers they used and which gene they are designed to target.

Figure 1 - The primers are designed to flank the target DNA sequence, this allows for targeted replication.

A thermostable DNA polymerase - This is the enzyme which binds to the primers and attaches free nucleotides in the sequence determined by the template DNA.

Free deoxynucleotide triphosphates (dNTPs) - All 4 nucleotides (dATP, dCTP, dGTP, dTTP) that form DNA will need to be added.

Mg²⁺ions - Some DNA polymerases need Magnesium ions to function. The concentration needs to be carefully calculated and calibrated (2)⁠.

The reaction vessel completes timed cycles of different temperatures at which different stages of DNA amplification take place. Each cycle has 3 stages:

Denaturing of double-stranded DNA to form single-strands at 94 – 96°C.
Annealing of primers (forward to the sense strand, reverse to the antisense strand) at a variable temperature.
Extension of the DNA from the primers by DNA polymerase at 68 – 78°C (2)⁠.

Figure 2 - The PCR has 3 stages to each cycle: denaturing, annealing and replication.

The oligonucleotide primers are the most controllable variable of any PCR. They can also be the source of bias. Even if Firmicutes are the most abundant phyla in the environmental sample they will not be found on subsequent analysis if the primers don’t successfully bind to Firmicute DNA. As such, primer design is a major component of any microbial community study.

Primer Design

As mentioned before, the 16S rRNA gene is present in all organisms from the three domains: Archaea, Bacteria and Eukaryota. The nucleotide sequence can be divided into 9 hypervariable regions (termed V1 – V9) flanked by conserved regions (3)⁠. Primers that target the 16S rRNA gene are designed to bind to the conserved regions, allowing for replication of the hypervaribale regions which can then be analysed depending on the goals of the study. A wide variety of primers can be chosen depending on whether researchers wish to amplify the entire gene, or just specific hypervariable regions.

There is no established nomenclature for primers but two systems are prevalent, a short and a long name. The long name uses the system proposed by Alm et al. in which each primer is named via 7 key features separated by hyphens, e.g. S-D-Bact-0008-a-S-20 (4)⁠. The short name consists of a number corresponding to the first base of the target sequence on the 16S rRNA gene, followed by either F (for forward) or R (for reverse). For example, the universal primers that amplify the entire 16S rRNA gene are 8F and 1492R (5-8)⁠. Short names may be followed by a -GC, which shows the presence of a GC-clamp used for denaturing and temperature gradient gel electrophoresis. The numbering system in both systems is based on the corresponding base from the 16S rRNA gene of Escherichia coli unless otherwise specified (4,9)⁠. The hypervariable regions have been identified as spanning (approximately) the following bases:

V1 : 69 – 99 V4 : 576 – 682 V7 : 1117 – 1173

V2 : 137 – 242 V5 : 822 – 879 V8 : 1243 – 1294

V3 : 433 – 497 V6 : 986 – 1043 V9 : 1435 – 1465

This can be used to identify which primers target which hypervariable region, for example 986F and 1401R will amplify the V6, V7 and V8 regions (10)⁠.

Let's have a look at a research paper that uses PCR. Here's the section about primers:

"The V3 region of the 16S rRNA genes (position 339–539 in the Escherichia coli gene) of bacteria was amplified using primers HDA1-GC (5'-CGC CCG GGG CGC GCC CCG GGC GGG GCG GGG GCA CGG GGG GAC TCC TAC GGG AGG CAG CAG T-3'; the GC-clamp is in boldface) and HDA2 (5'-GTA TTA CCG CGG CTG CTG GCA C-3')"

We can ignore the GC-clamp for now. So we have two primers and their sequences, but what does this actually correspond to? We can get the actual rRNA gene sequences for bacteria from the SILVA database and have a look to check if this sequence actually exists in real life, so below are the sequences for 2 species of bacteria from different phyla - Lactobacillus casei (a Firmicute) and everyone's favourite Escherichia coli (from Gammaproteobacteria).

AGTTTGATCATGGCTCAGGATGAACGCTGGCGGCGTGCCTAATACATGCAAGTCGAACGAGTTCTCGTTGATGATCGGTGCTTGCACCGAGATTCAACATGGAACGAGTGGCGGACGGGTGAGTAACACGTGGGTAACCTGCCCTTAAGTGGGGGATAACATTTGGAAACAGATGCTAATACCGCATAGATCCAAGAACCGCATGGTTCTTGGCTGAAAGATGGCGTAAGCTATCGCTTTTGGATGGACCCGCGGCGTATTAGCTAGTTGGTGAGGTAATGGCTCACCAAGGCGATGATACGTAGCCGAACTGAGAGGTTGATCGGCCACATTGGGACTGAGACACGGCCCAAACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCACAATGGACGCAAGTCTGATGGAGCAACGCCGCGTGAGTGAAGAAGGCTTTCGGGTCGTAAAACTCTGTTGTTGGAGAAGAATGGTCGGCAGAGTAACTGTTGTCGGCGTGACGGTATCCAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGATTTATTGGGCGTAAAGCGAGCGCAGGCGGTTTTTTAAGTCTGATGTGAAAGCCCTCGGCTTAACCGAGGAAGCGCATCGGAAACTGGGAAACTTGAGTGCAGAAGAGGACAGTGGAACTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAAGAACACCAGTGGCGAAGGCGGCTGTCTGGTCTGTAACTGACGCTGAGGCTCGAAAGCATGGGTAGCGAACAGGATTAGATACCCTGGTAGTCCATGCCGTAAACGATGAATGCTAGGTGTTGGAGGGTTTCCGCCCTTCAGTGCCGCAGCTAACGCATTAAGCATTCCGCCTGGGGAGTACGACCGCAAGGTTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAGGTCTTGACATCTTTTGATCACCTGAGAGATCAGGTTTCCCCTTCGGGGGCAAAATGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATGACTAGTTGCCAGCATTTAGTTGGGCACTCTAGTAAGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGACCTGGGCTACACACGTGCTACAATGGATGGTACAACGAGTTGCGAGACCGCGAGGTCAAGCTAATCTCTTAAAGCCATTCTCAGTTCGGACTGTAGGCTGCAACTCGCCTACACGAAGTCGGAATCGCTAGTAATCGCGGATCAGCACGCCGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGAGAGTTTGTAACACCCGAAGCCGGTGGCGTAACCCTTTTAGGGAGCGAGCCGTCTAAGGTGGGACAAATGATTAGGGTGAAGTCGTAACAAGGTAGCCGTAGGAGAACCTGCGGCTGGATCACCTCCTTA

AAATTGAAGAGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGGAAGAAGCTTGCTCTTTGCTGACGAGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAGGGGGACCTTCGGGCCTCTTGCCATCGGATGTGCCCAGATGGGATTAGCTAGTAGGTGGGGTAACGGCTCACCTAGGCGACGATCCCTAGCTGGTCTGAGAGGATGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCACGGAAGTTTTCAGAGATGAGAATGTGCCTTCGGGAACCGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTCCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGACCAGGGCTACACACGTGCTACAATGGCGCATACAAAGAGAAGCGACCTCGCGAGAGCAAGCGGACCTCATAAAGTGCGTCGTAGTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGTGGATCAGAATGCCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGGTGAAGTCGTAACAAGGTAACCGTAGGGGAACCTGCGGTTGGATCACCTCCTTA

So, we've found our Forward Primer, followed by the DNA we're going to replicate, note the differences between L. casei and E. coli which will help with identifying species and building phylogenetic trees. Finally there's the Reverse Primer. Now, remember DNA is double stranded, so what you see above is half the picture as it's the sequence of one strand. The Reverse Primer binds to the other strand of DNA, so what's highlighted is the complementary sequence for what's actually listed in the paper.

The best primers to use will be defined by the aims and methods used in the experiment. Typically, techniques for fingerprinting microbial communities like DGGE and terminal restriction fragment length polymorphism (T-RFLP) use DNA fragments of no more than 500 base pairs (11)⁠. Modern sequencing techniques are also limited by read length with higher throughput techniques often producing lower read lengths. 454 pyrosequencing produces read lengths of up to 700 base pairs, while Illumina sequencing only provides reads of 300 (although this system has the advantage of being able to sequence DNA at both ends of a strand). Since the 16S rRNA gene is roughly 1500 base pairs long, we must select which variable region to use in our analysis. Not all variable regions are of equal merit when identifying different bacterial genera and species, for example it has been reported that V2 was unable to distinguish between common Staphylococcal and Streptococcal pathogens, but provided the best region when analysing differences between Mycobacterial species (10)⁠. The same study concluded that regions V2 and V6 have the maximum nucleotide heterogeneity and therefore are the best regions for discriminating between a selection of 110 bacterial species (10)⁠.

When designing primers, there are two viable options. It is possible to review the literature and reuse primers which have been previously validated. However, it’s important to consider whether or not these primers are still valid, as new additions to 16S rRNA databases may reveal that previous primers are not as specific as once thought. Another option is using software such as ARD or PRIMROSE to design primers according to sequences for bacteria which appear in online databases of 16S rRNA gene sequences (2)⁠.

References

Bartlett JMS, Stirling D. A short history of the polymerase chain reaction. Methods Mol Biol. 2003;226:3–6.

Osborn M a, Smith CJ. Molecular Microbial Ecology. Vol. 51. 2009. 370 p

Van de Peer Y, Chapelle S, De Wachter R. A quantitative map of nucleotide substitution rates in bacterial rRNA. Nucleic Acids Res. 1996;24(17):3381–91.

Alm EW, Oerther DB, Larsen N, Stahl D a, Raskin L, Alm EW, et al. The oligonucleotide probe database . The Oligonucleotide Probe Database. 1996;62(10):3557–9.

Gong J, Forster RJ, Yu H, Chambers JR, Wheatcroft R, Sabour PM, et al. Molecular analysis of bacterial populations in the ileum of broiler chickens and comparison with bacteria in the cecum. 2002;1376:1–9.

Gong J, Forster RJ, Yu H, Chambers JR, Sabour PM, Wheatcroft R, et al. Diversity and phylogenetic analysis of bacteria in the mucosa of chicken ceca and comparison with bacteria in the cecal lumen. FEMS Microbiol Lett. 2002;208(1):1–7.

Lu J, Idris U, Harmon B, Maurer JJ, Lee MD, Hofacre C. Diversity and Succession of the Intestinal Bacterial Community of the Maturing Broiler Chicken Diversity and Succession of the Intestinal Bacterial Community of the Maturing Broiler Chicken. Appl Environ Microbiol. 2003;69(11):6816–24.

Gong J, Si W, Forster RJ, Huang R, Yu H, Yin Y, et al. 16S rRNA gene-based analysis of mucosa-associated bacterial community and phylogeny in the chicken gastrointestinal tracts: From crops to ceca. FEMS Microbiol Ecol. 2007;59(1):147–57.

Brosius J, Palmer ML, Kennedy PJ, Noller HF. Complete nucleotide sequence of a 16S ribosomal RNA gene from Escherichia coli. Proc Natl Acad Sci USA. 1978;75(10):4801–5.

Chakravorty S, Helb D, Burday M, Connell N. A detailed analysis of 16S ribosomal RNA gene segments for the diagnosis of pathogenic bacteria. J Microbiol Methods. 2007;69(2):330–9.

Schmalenberger A, Schwieger F, Tebbe CC. Effect of Primers Hybridizing to Different Evolutionarily Conserved Regions of the Small-Subunit rRNA Gene in PCR-Based Microbial Community Analyses and Genetic Profiling. Appl Environ Microbiol. 2001;67(8):3557–63.

Friday, 18 November 2016

The Polymerase Chain Reaction

No comments:

Post a Comment