File that store overrepresented sequences in on Google Drive: Scott>Quality Analysis>fastqc_overrepresented.xlsx
First thing to note is that all overrepresented sequences are either a sequencing artifact or match some region of the H. influenzae genome (i.e. no overrepresented contaminant).
Here is what the abundant sequences matched to:
HI_0139 - outer membrane protein
Tended to be overrepresented in hfq, toxin, antitoxin and toxin/antitoxin knockouts in MIV at t=10 and t=30
Note: this is the only overrepresented gene that encodes a protein product
Here is what the abundant sequences matched to:
HI_0139 - outer membrane protein
Tended to be overrepresented in hfq, toxin, antitoxin and toxin/antitoxin knockouts in MIV at t=10 and t=30
Note: this is the only overrepresented gene that encodes a protein product
Between HI_1677-HI_1678 - RNase P
Overrepresented in every sample
Between HI_1281-HI_1282 - tmRNA
Overrepresented in most samples, no apparent pattern.
Between HI_0957-HI_0958 - C4 antisense RNA
Overrepresented in many samples (not a single hypercompetent sample though)
Note: the gene sits right next to CRP
Between HI_0857-HI_0858 - 6S RNA
Overrepresented in the old KW20 samples at t=100 in MIV (and one sxyx sample at the same timepoint)
Between HI_0631-HI_0632 - tRNA Thr
Not sure why only this tRNA showed up. Overrepresented in the old KW20 samples at t=100 in MIV.
-RNA-based function
-transcribed from a small gene
-would expect to be highly expressed
Intuitively, I suppose this makes sense because a highly transcribed gene has an abundance of transcripts floating around and a smaller transcript gives rise to a smaller range of possible reads.
Anyway, the take-home message of this is that nothing strange is seen here. There were no hits for rRNA or any organism outside of H. influenzae. This is good.
So, this wraps up my in-depth look at the data quality. Ignoring a particularly poor antitoxin knockout sample, I have yet to see anything that would suggest that the data quality is poor - which is great. My next goal is to verify that the mutant strains are actually carry a mutation and that there is no mixup between timepoints (although I have some evidence this is the case for at least a pair of samples).
1 comment:
I don't understand this analysis. Overrepresented according to what criterion? Overrepresented relative to what expectation?
Post a Comment