I was asked to check if the strains with the spec cassette have anything unusual in terms of gene expression.
First I needed to determine which samples had the spec cassette. To do this, I ran a grep command on the FASTQ files of may samples and looked for the string "GGGGATCCGTCGACCTGCAGTTCGAAG" which is a sequence in the spec cassette.
gunzip -cd filename.fq.gz | grep GGGGATCCGTCGACCTGCAGTTCGAAG
The following samples had reads that match this string:
The following samples did not have reads that match this string:
I checked every sample for strains in red and they were all consistent within a given strain. I only checked a single condition for everything else and not a single match was found in these.
For these marked strains, I pulled out all the genes that were significantly different compared to KW20 under the same condition. (ie. cmnx_M0 vs kw20_M0, antx_M1 vs kw20_M1, etc). Among these 12 samples, there was a total of 222 differentially expressed genes compared to KW20 (190 unique genes)
Only one gene was found differentially expressed in more than 3 samples. HI_0658 (ATP-binding cassette, subfamily F, member 3) was found differentially expressed in 4 (33%) of the samples, however all four samples were antx so this is not a spec-specific effect.
A few genes (~10 in total) show up in 3 samples. HI_0062 (dnaK suppressor protein, ~3x downregulated) and HI_0235 (alternative ribosome-rescue factor, ~5x downregulation) are differentially expressed compared to KW20 in cmnx_M3, antx_M3 and toxx_M3.
However, looking at another comparison, sxyx_M3 vs KW20_M3 (chosen at random), I see the same behaviour for these two genes. Perhaps it could be due some skewing of the data due to the problematic KW20_M3_C replicate that Rosie mentioned in the previous post.
A operon containing ribosomal protein HI_0776-779 (~2.5x upregulated) is differentially expressed compared to KW20 in cmnx_M1, antx_M2 and toxx_M2.
In summary, if there is any effect caused by having a spec cassette, is it not strong enough to really alter the set of differentially expressed genes.