Wednesday 1 March 2017

Pyrosequencing

Sanger sequencing was used in the first generation of DNA sequencers. Despite miniaturisation and a certain amount of automation, it was still relatively low throughput and laborious. Pyrosequencing was the first second generation sequencing method, providing researchers with a much higher throughput. With pyrosequencing there wasn't any need for messing around with clone libraries, you could just stick your sample in, amplify it up and sequence it. In papers you might see it referred to as 454 pyrosequencing or Roche pyrosequencing, they are essentially the same method but using different machines from different companies.

Putting the Pyro is Pyrosequencing

Like Sanger sequencing, pyrosequencing uses sequencing by synthesis. Rather than inferring the sequence from electrophoresis, pyrosequencing uses biochemistry and the emission of light.

The pyro in pyrosequencing comes from pyrophosphate. Essentially, pyrosequencing measures the amount of pyrophosphate in a reaction well or vessel. Why pyrophosphate? Well, there are three chemical reactions behind pyrosequencing:

1. The addition of a nucleotide to a DNA strand catalysed by the enzyme DNA polymerase which produces a pyrophosphate as a byproduct. Pyrophosphate is also called diphosphate (two phosphates stuck together) and often abbreviated to PPi.

This diagram shows the nucleotides wit their phosphate tails. One phosphate is used to bind to the previous nucleotide, releasing the other two as pyrophosphate (PPi)
  DNA_polymerase.svg: Madprime derivative work: Chandres, DNA polymerase-FR, Cropped and text removed, CC BY-SA 3.0

2. The addition of this PPi to AMP (adenosine monophosphate) to make ATP.

3. The conversion of luciferin to oxylucifern. This process consumes one molecule of ATP and gives out one photon of light.

The luciferase used to produce the light signal is the same enzyme found in fireflies.
  art farmer from Evansville Indiana, USA, Photinus pyralis Firefly 4, CC BY-SA 2.0

So we can go from the PPi produced by adding a nucleotide to a strand of DNA, to the emission of a light signal proportional to the amount of PPi initially present (1 PPi --> 1 ATP --> 1 photon released). If a nucleotide has been added then a light signal is produced. So far so good... but how do they actually make this into something practical?

Emulsion PCR

Ordinary PCR is not enough for pyrosequencing. Ordinary PCR is fairly inelegant, you mix everything together in a small tube, put it in a PCR machine to produce a big mix of everything together. This is fine for a homogenous sample (sequencing one gene from the same person, for example). However, in microbiomics there's a mind-bogglingly heterogeneous sample with hundreds of different 16S rRNA gene sequences.

In the past scientists came up with ways to separate and then sequence the DNA (see clone libraries and DGGE). This isn't good enough for pyrosequencing. Pyrosequencing is a different beast. It allows you to simultaneously sequence a million DNA strands with different sequences, at the same time, in real-time. It's massive parallel sequencing. But first, you need to separate out the strands of DNA, which is where emulsion PCR comes in.

There are loads of good resources (like this one or this video) that describe the process of emulsion PCR (ePCR) if you want detail, however the basics are thus:

1. Each strand of DNA in a sample is attached to a tiny microbead in an oil-base which also contains everything you need for a regular PCR (primers, buffers, dNTPs and DNA polymerase).

2. The oil is then emulsified in water to form tiny droplets such that each droplet of oil contains one bead along with all of the stuff needed to amplify the DNA on the bead. Each droplet can be considered as a mini reaction vessel.

3. Run the reaction through cycles of denaturation, annealing and elongation as for a regular PCR.

4. What you end up with at the end is billions of microbeads coated with copies of whichever bit of DNA original attached to them.

Leveque Theau, EmPCRwiki, Text changed, CC BY-SA 3.0

I can't tell you what wizardry is employed to make sure that one piece of DNA binds to one bead which is them subsequently placed inside one droplet of oil in a reaction vessel. But it seems to work.

Sequencing the DNA Strands

Now that we've amplified, we can sequence the DNA attached to the tiny, tiny beads. Pyrosequencing takes place on a plate with lots and lots of tiny, tiny wells big enough to accommodate just one of our tiny, tiny beads. The beads are washed over the plate so each one sits inside a well. Everything needed for DNA synthesis is added to the well (except the dNTPs), as well as the enzymes and substrates to detect the PPi and emit a light signal.

In pyrosequencing there's a lot of repetitive washing. Let's have a look at what happens:

1. A dNTP (GTP) is washed over the plate into all of the wells.

2. DNA synthesis takes place, so if the next bases on the DNA strand in the well are Gs, they will be added to the DNA strand. Note that if the sequence is ATCAGGGGGGGAT, all of those 7 Gs will be added at once.

3. Conditions in the reaction well are changed so that the enzymes for the detection of PPi can do their work. If no GTP was added to the strand, there will be no PPi and therefore no light signal. Remember that the light signal is proportional to the amount of PPi, so if two Gs were added then the light signal will be twice as strong as the signal when one G is added. We can use a computer to interpret the light signals  and decide how many Gs have been added.

4. The spare GTP is then either washed out or inactivated.

5. The next nucleotide is washed over the plate and the process repeated sequentially for each nucleotide until all of the DNA has been sequenced.

Just to illustrate this, let's imagine that we have a plate with 4 wells, each with the following 10 base pair DNA sequences in them:

1. TGCCCCTTTC
2. GACCCAAAAT
3. GGGCTTTTAA
4. TTACCCCCTA

We can watch them being sequenced in the animation below:
Except that this doesn't just happen with 4 sequences over 10 base pairs. It's happening with up to 1,000,000 sequences over 400-500 base pairs. A computer analyses the light signals emitted from each well and decides how many nucleotides were added depending on the intensity of the light (with reference to prior calibration).

However, pyrosequencing is not without its flaws, but we can look at those in another post.