Tuesday, May 12, 2015

Microarray vs RNAseq data in rich medium

To determine if the sBHI RNAseq data we have is really an issue, I pulled out some old microarray data from this paper. The paper claims the following:

Cultures become modestly competent in colonies on sBHI agar plates and when liquid sBHI cultures approach stationary phase. With the exception of ssb, all genes in the CRE regulon were also modestly induced  (4–20  times) as  stationary phase approached during growth in rich medium (data not shown). This suggests that the low level of competence seen at this stage is not due to failure of a particular competence function, but to a general low induction of all components. This is supported by the modest increases in expression of sxy and of the CRP-regulon genes induced in MIV.

The paper reports that all genes in the competence regulon were induced 4 to 20 fold in rich media. But is this true? Josh wrote an R script that made a rough plot of what the microarray says about the competence regulon. Red is MIV and blue is sBHI. He took the gene expression value from each timepoint and divided it by the expression value from the first timepoint (as a rough normalization). Therefore, the y-axis corresponds to gene expression level relative to the value from t=-70 (which is why all the t=-70 values are 1). It looks like the y-axis is on a log-scale according to the R script.

This plot is a little rough but that is the consequence of using the R base package for plotting. As a testament to the superiority of ggplot2 to the R base package, I have some new plots that are hopefully more informative. Note that these are not on a log-scale.

Here is the data from the microarray (sBHI only). It looks like a large subset of the competence genes fail to be induced even 2-fold and the highest is only up 11-fold.

I've done a similar plot for the RNAseq data, except using count values instead of the intensity measurements taken by the microarray:



















This only shows data from one replicate (A) but I the other two are very similar. Here we see that many genes appear to be slightly downregulated, but a handful are upregulated up to 2 fold.

As it is currently plotted, these two graphs cannot be compared directly since their x-axis differ. To correct for this, I estimated what the OD600 of the microarray data would be assuming that the OD600 at t=0 was 0.2 (when cells were transferred into MIV) and a doubling time of ~32 minutes. Here's what I get when combining both plots:

Interestingly, this plot shows that both datasets appear to behave similarly for many genes. But there appear to be ~10 genes that are induced in the microarray that show no signs of induction in the RNAseq data.

Overall, I would like to know why the paper claimed that these genes are induced 4-20 fold when in fact they appear to be induced 0-10 fold. Keep in mind though, that this normalization was very rough and the OD values are estimated. But overall, this result makes me a little bit more comfortable with the RNAseq data. I would still like to see in vivo transformation frequencies for the culture used in the RNAseq data.

3 comments:

Rosie Redfield said...

I'm going to test the frozen OD=1.0 samples for the three KW20-in-sBHI replicates. Maybe tomorrow.

Lauri said...

Would it be possible to post all the competence curves for the strains we are looking at? I know rpoD is hyper-competent in early-log, but I am not as familiar with the others. If someone could send me the data for those I could make the graphs, good R practice for me.

Lauri said...

I want to make these graphs so I can compare competence in sBHI at the various OD/timepoints for all the strains.