Tuesday, June 16, 2015

Recent Lab Meeting

I've uploaded the powerpoint I used during the last lab meeting I did. I'll get around to making a proper post about it eventually, but until then I will have the plots available here:

https://www.slideshare.net/secret/3Eg3j4EsbhTQqS

Wednesday, June 10, 2015

murE Crystal structures

I've been investigating murE hypercompetence today and I just wanted to make a post for future reference if I ever need it.

I've been looking at crystal structures of murE from here:
http://www.ncbi.nlm.nih.gov/structure/?term=%22mure%22


There are 2 structures from M. tuberculosis. The first shows murE bound to UAG and magnesium. You can see that the ADP binding pocket between the brown and blue subdomains is open.

The second M. tuberculosis murE structure includes ADP:



In E. coli, it's clear that the structure is very similar:


There's also one for Staphylococcus aureus that is again, very similar.

But where do we find the murE point mutations? Looking at the M. tuberculosis structure, we find them here (based on a BLAST alignment):


murE751 is a L -> S substitution found on an alpha helix (the backside where the arrow is pointing).


It looks pretty close to the active site, but it points away from all the action. The other murE point mutations (murE749, G -> R; murE750, G -> W) appear on the same brown domain but far away on the surface of the protein:




The first mutation mentioned may hinder the activity since it is relatively close to the ADP binding area but for the most part, I'm skeptical that any of these point mutations have a big impact on activity. I looked at the location of the point mutations for all three species (M. tuberculosis, E. coli and S. aureus) and it seems very consistent.

Friday, June 5, 2015

Does transcription impact recombination?

The short answer seems to be a no.

So I got Josh's recombination data where he gave Haemophilus influenzae 86-028NP donor DNA to KW20 and sequenced 10,000 pooled colonies that had undergone recombination.

I wanted to see if transcription affected recombination (ex. areas of high recombination tend to recombine less) so I made some plots that combined both sets of data:


Here I graphed the % donor alleles as the points. These points correspond to how much recombination is happening. I added some rectangles (genes) at the bottom to orient the graph. The black line corresponds to transcription. The line is the log of the coverage from KW20 at timepoint M0; This is straight from raw read counts. Note there is no axis for the values, but it is plotted on the same scale between graphs. I just wanted to see relative levels of transcription.

I made plots like these for the whole genome:




I went though all of these plots and there does not seem to be any relationship between transcription and recombination at all. If there is an effect of transcription, it is very small and probably not worth looking into deeper at this point.

If interested, I uploaded all 182 plots and the R script I made for this in the Google Drive:
RNAseq docs/Scott/Recombination

Wednesday, June 3, 2015

Phase variable genes are a pain

Since I've started looking at the RNA-seq data, there are a few genes that periodically show up as very differentially expressed. Trying to figure out what is going on has been rough, but I think I finally understand. These genes appear to be phase variable.

HI1537 (licA), HI1538 (licB), HI1539 (licC), HI1540 (licD)
has CAAT repeats:


Appears to be strongly downregulated in crpx only
in crpx:



in kw20 (and what seems to be everything other than crpx):


Note: that black segments corresponds to a 4bp deletion. There appears to be an additional repeat in the crpx strain that decreases the transcript abundance.


 HI1287 (hsdM), HI1286 (hsdS), HI1285 (hsdR)
has AGCAG repeats:


Appears to be strongly upregulated in taxx only
in taxx (note the 5 bp deletion):



in kw20 (and what seems to be everything other than taxx):


It looks like the extra repeat in all the strain other than taxx causes a huge difference.


 HI0354 (lic3A)
has AACT repeats:


Note: these repeats go on for more than 100 basepairs. Since we use reads of length 101 I don't think it's possible to detect a deletion here. This is the downside of using short reads.

But this gene is highly upregulated in KW20 in sBHI

Compare kw20:

versus murE at the same timepoint:


I have a good reason that phase variation is also responsible for this difference.

And finally: 
 HI1457 (opa), HI1456 (??)
has no repeats.

This gene pair has been a pain in my side for a while. Mainly because it looks like it is turned on in sBHI only in cells that are hypercompetent:

For strains in BHI, there is like a... 1.6% chance of seeing this assuming each strain has a 50% chance of having the gene on. But in MIV, it looks like crpx (and maybe hfqx) are the only ones that don't really express this gene. There is not a clear view of what's going on in this data.

I dug around some papers and found this one: 

The phasevarion: a genetic system controlling coordinated, random switching of expression of multiple genes.



This paper is oddly pertinent to this blog post. The paper shows that the opacity protein opa is regulated by a type III restriction-modification system. HI1058/HI1059 encodes the mod gene that has tetranucleotide (AGTC) repeats. The number of repeats determine the reading frame. Two reading frames produce protein (72 or 86 kDa) and one doesn't. The paper shows that opa expression is reduced under one of these reading frames. There is likely some methylation happening that blocks opa expression.

Unfortunately, I am not able to see which samples have active mod because again, the repeat region is greater than 100 bp. The mod gene does seem to be expressed (significantly) more in the crp knockout though. For now, I think I'll treat opa as indirectly phase variable.

Monday, June 1, 2015

Things worth mentioning about competence genes in competence mutants

I've turned my focus on trying to figure out how/why the various competence mutants we have alter regular competence. To do this, I'm comparing expression of competence genes between strains.

A quick message about normalization: 
DESeq2 (the R package I used to normalize the count data) normalizes data to allow, for instance, a comparison between gene A in condition X and gene A in condition Y. It does not normalize in a way that makes comparisons between gene A and gene B possible because it does not take into account differences in gene length (longer genes are expected to have more reads).

To make comparisons between genes possible, instead of asking "how much expression of gene A is there?" I ask "how much expression of gene A is there compared to gene A expression in KW20 M0?" I did this by dividing normalized count values by the normalized count in KW20 M0. In essence, KW20 M0 is treated as a baseline and measurements are deviations from that baseline.

Here is some data (error bars are standard error):

The first set of plots shows KW20 competence gene expression. As you can see, most competence genes are ~10-100x induced in M2 relative to M0. Compare this to the sxy or crp knockout where very little induction is seen. Interestingly, HI0365 seems to be particularly down in these knockouts.


The toxin and double toxin/antitoxin knockout strains behave similarly to KW20 (except for the toxin/antitoxin genes, of course) and peak at pretty much the same levels in M2 as KW20. Competence expression in the double knockout (taxx) seems to be a bit lower than KW20 in M3 though. Overall, nothing unexpected.

Knocking out hfq appears to cause slower induction of competence (compare M1 to data above) and definitely does not lead to full induction of the competence regulon when comparing to KW20 M2. Presumably, this failure to completely turn on the competence regulon leads to the 10x comptence defect seen in this strain.

This one is confounding. There's definitely high expression of the toxin/antitoxin pair when the antitoxin is knocked out (at all timepoints!), suggesting that the toxin is self-promoting. This behaviour is seen in other toxin/antitoxin pairs as well. This strain was shown to not be able to uptake DNA, but competence gene expression is comparable to the hfq knockout strain (which is only mildly less competent). The only competence gene that is consistently down in this strain is HI1631 which pretty much nothing is known about (except that it may have some restriction enzyme like function... at least it has a motif that suggests that). Interesting to note though, the 3-enzyme restriction modification system hsdR, hsdM, and hsdS is hugely upregulated in the toxin/antoxin double knockout. This is odd because you'd think removing both the toxin and antitoxin should have no effect whatsoever on the transcriptome (or that you would see the same thing if you just removed the toxin...?)


Note, these two have BHI samples only. Strikingly, rpoD and sxy-1 behave very similarly. Sxy-1 is hypercompetent because of the weakened 5' stem mRNA structure which leads to more Sxy protein and expression of competence genes. Perhaps the rpoD point mutation works in a similar way to increase sxy translatability.

Finally, we have the infamous murE point mutation. Competence gene expression increases a bit from B1 to B2. Apparently competence genes are expressed more in murE in sBHI than KW20 in MIV. At the very least, it's possible to say that murE is hypercompetent because something is causing aggressive expression of competence genes (and not some change of membrane permeability, for example).

List of CRP-regulated genes

This data was produced by comparing the CRP knockout strain to KW20 in MIV only. This is based on one crpx timecourse replicate (I believe another replicate has been sent for sequencing).

First I compiled a list of predicted CRP binding sites and associated them with genes predicted to be first in an operon (using data from DOOR and ProOpDB and confirmation by looking for similar expression of neighbouring genes in the RNA-seq data).

I pulled out genes that were very evidently CRP regulated (were differentially expressed, had CRP site). I decided on a cut-off using this list. I let all genes that were significant (padj < 0.1) in at least 2 out of 4 timepoints be considered truly significant. This reduced a lot of likely false positives (for example, a lot of genes were significant at the last timepoint which does not follow expected behaviour from CRP-regulated genes).

I uploaded the results to the Google Drive:
Scott/DE Results/CRP genes 1.xlsx

This file shows the raw differential expression analysis results on the first sheet. The second sheet shows the raw data of the genes that passed the cut-off I described above. The third sheet shows a prettier version with fold changes instead of log-fold changes (these are compared to KW20) and asterisks to represent significance. Genes are grouped by operons. The fourth sheet shows the predicted CRP binding sites and the last sheet is a simple list of genes that are directly (with a CRP site) and indirectly (could not find a strong CRP site) regulated by CRP. I'll also post some of the results here:

GENES WITH CRP-N SITES

IDGeneCRP site 1CRP site 2CRP site 3
HI0035
AAATGTGACGAACGTATCATTT






HI0036
AAATGTGACGAACGTATCATTT






HI0053
TTTTGTGATATGGCTCACAAAA

HI0052



HI0051



HI0050m



HI0049kdgK


HI0048



HI0047eda







HI0075nrdDTAATTTGATATTTTTCTAATAATATTGATCACAAAATCAAAAATCATTGTGATATTGATCACAAAA





HI0082
TTTCTTGATCCACGTCACATTA

HI0083








HI0131afuATATTATGAAATTCAACAAAATTAACTGTGAACTTCATCACGGTA
HI0129afuB


HI0126fbpC







HI0145
AAATGAGAAGTTGATCACATTT

HI0144



HI0143



HI0142nanA







HI0146
AAATGTGATCAACTTCTCATTT

HI0147








HI0289sdaCAAATTTTAACTTGATCACAATT

HI0288sdaA







HI0398
TTTTGTGACTCACTTCAAACTC

HI0399icc







HI0501rbsDTTTTGTGATCAATATCCCAAAT

HI0502rbsA


HI0503rbsC


HI0504rbsB







HI0521
AACTGTGATCTTCCTCACGTTT

HI0520








HI0534aspAAAATGTGATCTTCATCAAGTTT






HI0591speFTATTATGCCAAATTTAAAAATT

HI0590potE







HI0601tfoXATTTACGATCTGGCTCACAAAT






HI0604cyaAATTTACGATCTGGCTCACAAAT

HI0605gpsA


HI0606cysE


HI0607aroE







HI0608
TTTGTTGCTCTCGATCACATTT






HI0685glpATATTGTGATCAATATCACAAAAAAATGTGAAGTGTTTCACAAAT
HI0684glpB


HI0683glpC







HI0740yhxBAAATGTTAAGTAGATCAAAAAA






HI0745ansBTTATGTGATCGAGATCATAAAT






HI0804
TTTTGTTAAACACTTCACATTTAATATTTATCTAGTTCAAAATT





HI0809pckAAAATGAGATCTACTTAACATTTATTTTTGCTCTATATCACAATA





HI0815uspAAATTGTGATCTAGTACACAGTT






HI0822mglBATTTGTGACATGGATCACAAAT

HI0823mglA


HI0824mglC







HI0835frdATTTTTTGAGGTAGATCACAAAA

HI0834frdB


HI0833frdC


HI0832frdD







HI0884arcAAACTATGATTTAGATCACAAAA






HI1010
TTCTGTGATCTAGATCTCAGAT

HI1011



HI1012



HI1013



HI1014



HI1015gntP


HI1016








HI1111xylFAAATAGGATCTAGATCACAAAA

HI1110xylG


HI1109xylH







HI1031
AAATAGGATCTAGATCACAAAA

HI1030



HI1029



HI1028



HI1027lyx


HI1026



HI1025sgbE


HI1024ulaD







HI1089ccmAAAATAGGATCTAGATCACAAAA

HI1090ccmB


HI1091ccmC


HI1092ccmD


HI1093ccmE


HI1094ccmF


HI1095dsbE


HI1096mccmH


HI1097m








HI1126.1
AAATGTGATACAAGTCACAAAT






HI1210mdhAAATGTGAACTAGATCATAGAA






HI1218lctPTTATGAGATATTGATCACATTT






HI1245
AAGTTTGCAGTTCGTCACAATT






HI1350cddATAAGTGATCAAGATCACAGTT






HI1356malQATTATTGACGAAGATCACACTT

HI1357glgB


HI1358glgX


HI1359glgC


HI1360glgA







HI1398fumCTTTTATGATCTATGTCACAAAA






HI1427
TTTTGTGATCTCGATCACAAAT






HI1434.1cspDAAAATTGATTTAGATCATTAAA






HI1645fbpAAAATTGATTTAGATCATTAAA






HI1662sucAAAAATTGATTTAGATCATTAAA

HI1661sucB



 OTHER GENES


Competence genes


HI0061rec2


HI0299
HI0298
HI0297
HI0296hopD


HI0365
HI0366


HI0985dprA


HI1008


HI0439comA
HI0438comB
HI0437comC
HI0436comD
HI0435comE
HI0434comF


HI0660
HI0659
HI0658


HI0938
HI0939
HI0940
HI0941


HI0952radC


HI1117comM


HI1183


trp genes


HI0287mtr


HI0830trpR


HI1387trpE
HI1388trpG
HI1388.1
HI1389trpD
HI1389.1trpC
HI1390hybG


HI1430
HI1431trpB
HI1432trpA


fuc genes


HI0614fucI
HI0613fucK
HI0612fucU


Other


HI0141nagB
HI0140nagA


HI0148


HI0300ampD


HI0410tyrR


HI0584


HI0623fmt


HI0738ilvD


HI0764ribB


HI0956


HI1056


HI1434ybaK


HI1456
HI1457


HI1492


HI1537licA
HI1538licB
HI1539licC
HI1540licD


HI1655


HI1664


HI1682sohB

UPREGULATED:

HI0035, HI0047 (eda), HI0048, HI0049 (kdgK), HI0050m, HI0051, HI0052, HI0053, HI0061 (rec2),
HI0075 (nrdD), HI0082, HI0083, HI0126 (fbpC), HI0129 (afuB), HI0131 (afuA), HI0140 (nagA),
HI0141 (nagB), HI0142 (nanA), HI0143, HI0144, HI0145, HI0146, HI0147, HI0148, HI0288 (sdaA), HI0289 (sdaC), HI0296 (hopD), HI0297, HI0298, HI0299, HI0365, HI0366, HI0398, HI0399 (icc), HI0410 (tyrR), HI0434 (comF), HI0435 (comE), HI0436 (comD), HI0437 (comC), HI0438 (comB), HI0439 (comA), HI0501 (rbsD), HI0502 (rbsA), HI0503 (rbsC), HI0504 (rbsB), HI0520, HI0521, HI0534 (aspA), HI0590 (potE), HI0591 (speF), HI0601 (tfoX), HI0608, HI0612 (fucU), HI0613 (fucK), HI0614 (fucI), HI0623 (fmt), HI0658, HI0659, HI0660, HI0683 (glpC), HI0684 (glpB), HI0685 (glpA), HI0740 (yhxB), HI0745 (ansB), HI0804, HI0809 (pckA), HI0815 (uspA), HI0822 (mglB), HI0823 (mglA), HI0824 (mglC), HI0832 (frdD), HI0833 (frdC), HI0834 (frdB), HI0835 (frdA), HI0884 (arcA), HI0938, HI0939, HI0940, HI0941, HI0952 (radC), HI0985 (dprA), HI1008, HI1010, HI1011, HI1012, HI1013, HI1014, HI1015 (gntP), HI1016, HI1024 (ulaD), HI1025 (sgbE), HI1026, HI1027 (lyx), HI1028, HI1029, HI1030, HI1031, HI1110 (xylG), HI1111 (xylF), HI1117 (comM), HI1126.1, HI1183, HI1210 (mdh), HI1218 (lctP), HI1245, HI1350 (cdd), HI1356 (malQ), HI1357 (glgB), HI1358 (glgX), HI1359 (glgC), HI1360 (glgA), HI1398 (fumC), HI1427, HI1434.1 (cspD), HI1456, HI1457, HI1537 (licA), HI1538 (licB), HI1539 (licC), HI1540 (licD), HI1645 (fbp), HI1661 (sucB), HI1662 (sucA)

DOWNREGULATED:

HI0036, HI0300 (ampD), HI0584, HI0604 (cyaA), HI0605 (gpsA), HI0606 (cysE), HI0607 (aroE), HI0738 (ilvD), HI0956, HI1056, HI1089 (ccmA), HI1090 (ccmB), HI1091 (ccmC), HI1092 (ccmD), HI1093 (ccmE), HI1094 (ccmF), HI1095 (dsbE), HI1096m (ccmH), HI1097m, HI1434 (ybaK), HI1492, HI1655, HI1664, HI1682 (sohB)