Clone libraries are usually used to try and define which bacteria are found in different places. However, it might be tempting to infer abundances from this data. To illustrate how this is a potential pitfall in the interpretation of clone libraries, I wrote a Python program which creates a model bacterial community and then samples it randomly, mimicking the process of sampling from a clone library. All figures can be found in their full size in this imgur album.
I created 5 different model communities:
Full sized images can be found in this album 
Model 1: 20 OTUs, GF = 1
Model 2: 35 OTUs, GF = 1
Model 3: 50 OTUs, GF = 0.5
Model 4: 50 OTUs, GF = 1
Model 5: 50 OTUs, GF = 2
I then created a virtual clone library from each modeled community by randomly selecting OTUs. Each clone library was formed of 1,000 clones. From these 1,000 clones, I "sequenced" 100 clones to produce a Sampled Abundance to compare to the modeled abundance, producing the following bar graphs.
Full sized images can be found in this album 
Model Number  % OTUs Unsampled 
1  5% 
2  8.60% 
3  12% 
4  32% 
5  38% 
It appears that both having more OTUs and a more uneven community will leave a greater percentage of OTUs unsampled. Our sample size isn't brilliant, but we'll see if we can do something about that in a minute.
But first, this is a good opportunity to see the analysis discussed in the previous blog post in action. We can draw accumulation curves of the the sampling process:
Full sized images can be found in this album 
We can overcome this by looking at the coverage.
Model
Number

Coverage

1

0.842

2

0.780

3

0.680

4

0.530

5

0.613

Model Number  Chao1 Estimator 
1  20.5 
2  34.5 
3  51.5 
4  76.7 
5  39.0 
Model Number  Simpson's Index of Diversity  Difference  Shannon Index  Difference  
1  Model  0.921  0.001  2.67  0.03 
Sample  0.920  2.64  
2  Model  0.967  0.003  3.35  0.09 
Sample  0.964  3.26  
3  Model  0.984  0.003  3.77  0.12 
Sample  0.981  3.65  
4  Model  0.974  0.036  3.67  0.57 
Sample  0.938  3.10  
5  Model  0.925  0.023  3.32  0.46 
Sample  0.902  2.86 
The models with the lowest coverage have the highest differences between the Simpson's Index of Diversity and the Shannon Index calculated for the model and the sample.
As I said earlier we, our sample is very small. We've only got 1 replication of sampling from each model... but because I'm using a computer program to do it, I could replicate it 10,000 times. That's a bit excessive, so let's just sample each model 100 times and see what we get. However, we'll have to leave that for the next post, as I should probably get on with doing some actual work!
No comments:
Post a Comment