Tuesday, June 16, 2015

Recent Lab Meeting

I've uploaded the powerpoint I used during the last lab meeting I did. I'll get around to making a proper post about it eventually, but until then I will have the plots available here:

https://www.slideshare.net/secret/3Eg3j4EsbhTQqS

Wednesday, June 10, 2015

murE Crystal structures

I've been investigating murE hypercompetence today and I just wanted to make a post for future reference if I ever need it.

I've been looking at crystal structures of murE from here:
http://www.ncbi.nlm.nih.gov/structure/?term=%22mure%22

There are 2 structures from M. tuberculosis. The first shows murE bound to UAG and magnesium. You can see that the ADP binding pocket between the brown and blue subdomains is open.

The second M. tuberculosis murE structure includes ADP:

In E. coli, it's clear that the structure is very similar:

There's also one for Staphylococcus aureus that is again, very similar.

But where do we find the murE point mutations? Looking at the M. tuberculosis structure, we find them here (based on a BLAST alignment):

murE751 is a L -> S substitution found on an alpha helix (the backside where the arrow is pointing).

It looks pretty close to the active site, but it points away from all the action. The other murE point mutations (murE749, G -> R; murE750, G -> W) appear on the same brown domain but far away on the surface of the protein:

The first mutation mentioned may hinder the activity since it is relatively close to the ADP binding area but for the most part, I'm skeptical that any of these point mutations have a big impact on activity. I looked at the location of the point mutations for all three species (M. tuberculosis, E. coli and S. aureus) and it seems very consistent.

Friday, June 5, 2015

Does transcription impact recombination?

The short answer seems to be a no.

So I got Josh's recombination data where he gave Haemophilus influenzae 86-028NP donor DNA to KW20 and sequenced 10,000 pooled colonies that had undergone recombination.

I wanted to see if transcription affected recombination (ex. areas of high recombination tend to recombine less) so I made some plots that combined both sets of data:

Here I graphed the % donor alleles as the points. These points correspond to how much recombination is happening. I added some rectangles (genes) at the bottom to orient the graph. The black line corresponds to transcription. The line is the log of the coverage from KW20 at timepoint M0; This is straight from raw read counts. Note there is no axis for the values, but it is plotted on the same scale between graphs. I just wanted to see relative levels of transcription.

I made plots like these for the whole genome:

I went though all of these plots and there does not seem to be any relationship between transcription and recombination at all. If there is an effect of transcription, it is very small and probably not worth looking into deeper at this point.

If interested, I uploaded all 182 plots and the R script I made for this in the Google Drive:
RNAseq docs/Scott/Recombination

Wednesday, June 3, 2015

Phase variable genes are a pain

Since I've started looking at the RNA-seq data, there are a few genes that periodically show up as very differentially expressed. Trying to figure out what is going on has been rough, but I think I finally understand. These genes appear to be phase variable.

HI1537 (licA), HI1538 (licB), HI1539 (licC), HI1540 (licD)
has CAAT repeats:

Appears to be strongly downregulated in crpx only

in crpx:

in kw20 (and what seems to be everything other than crpx):

Note: that black segments corresponds to a 4bp deletion. There appears to be an additional repeat in the crpx strain that decreases the transcript abundance.

HI1287 (hsdM), HI1286 (hsdS), HI1285 (hsdR)

has AGCAG repeats:

Appears to be strongly upregulated in taxx only

in taxx (note the 5 bp deletion):

in kw20 (and what seems to be everything other than taxx):

It looks like the extra repeat in all the strain other than taxx causes a huge difference.

HI0354 (lic3A)

has AACT repeats:

Note: these repeats go on for more than 100 basepairs. Since we use reads of length 101 I don't think it's possible to detect a deletion here. This is the downside of using short reads.

But this gene is highly upregulated in KW20 in sBHI

Compare kw20:

versus murE at the same timepoint:

I have a good reason that phase variation is also responsible for this difference.

And finally:

HI1457 (opa), HI1456 (??)

has no repeats.

This gene pair has been a pain in my side for a while. Mainly because it looks like it is turned on in sBHI only in cells that are hypercompetent:

For strains in BHI, there is like a... 1.6% chance of seeing this assuming each strain has a 50% chance of having the gene on. But in MIV, it looks like crpx (and maybe hfqx) are the only ones that don't really express this gene. There is not a clear view of what's going on in this data.

I dug around some papers and found this one:

The phasevarion: a genetic system controlling coordinated, random switching of expression of multiple genes.

This paper is oddly pertinent to this blog post. The paper shows that the opacity protein opa is regulated by a type III restriction-modification system. HI1058/HI1059 encodes the mod gene that has tetranucleotide (AGTC) repeats. The number of repeats determine the reading frame. Two reading frames produce protein (72 or 86 kDa) and one doesn't. The paper shows that opa expression is reduced under one of these reading frames. There is likely some methylation happening that blocks opa expression.

Unfortunately, I am not able to see which samples have active mod because again, the repeat region is greater than 100 bp. The mod gene does seem to be expressed (significantly) more in the crp knockout though. For now, I think I'll treat opa as indirectly phase variable.

Monday, June 1, 2015

Things worth mentioning about competence genes in competence mutants

I've turned my focus on trying to figure out how/why the various competence mutants we have alter regular competence. To do this, I'm comparing expression of competence genes between strains.

A quick message about normalization:
DESeq2 (the R package I used to normalize the count data) normalizes data to allow, for instance, a comparison between gene A in condition X and gene A in condition Y. It does not normalize in a way that makes comparisons between gene A and gene B possible because it does not take into account differences in gene length (longer genes are expected to have more reads).

To make comparisons between genes possible, instead of asking "how much expression of gene A is there?" I ask "how much expression of gene A is there compared to gene A expression in KW20 M0?" I did this by dividing normalized count values by the normalized count in KW20 M0. In essence, KW20 M0 is treated as a baseline and measurements are deviations from that baseline.

Here is some data (error bars are standard error):

The first set of plots shows KW20 competence gene expression. As you can see, most competence genes are ~10-100x induced in M2 relative to M0. Compare this to the sxy or crp knockout where very little induction is seen. Interestingly, HI0365 seems to be particularly down in these knockouts.

The toxin and double toxin/antitoxin knockout strains behave similarly to KW20 (except for the toxin/antitoxin genes, of course) and peak at pretty much the same levels in M2 as KW20. Competence expression in the double knockout (taxx) seems to be a bit lower than KW20 in M3 though. Overall, nothing unexpected.

Knocking out hfq appears to cause slower induction of competence (compare M1 to data above) and definitely does not lead to full induction of the competence regulon when comparing to KW20 M2. Presumably, this failure to completely turn on the competence regulon leads to the 10x comptence defect seen in this strain.

This one is confounding. There's definitely high expression of the toxin/antitoxin pair when the antitoxin is knocked out (at all timepoints!), suggesting that the toxin is self-promoting. This behaviour is seen in other toxin/antitoxin pairs as well. This strain was shown to not be able to uptake DNA, but competence gene expression is comparable to the hfq knockout strain (which is only mildly less competent). The only competence gene that is consistently down in this strain is HI1631 which pretty much nothing is known about (except that it may have some restriction enzyme like function... at least it has a motif that suggests that). Interesting to note though, the 3-enzyme restriction modification system hsdR, hsdM, and hsdS is hugely upregulated in the toxin/antoxin double knockout. This is odd because you'd think removing both the toxin and antitoxin should have no effect whatsoever on the transcriptome (or that you would see the same thing if you just removed the toxin...?)

Note, these two have BHI samples only. Strikingly, rpoD and sxy-1 behave very similarly. Sxy-1 is hypercompetent because of the weakened 5' stem mRNA structure which leads to more Sxy protein and expression of competence genes. Perhaps the rpoD point mutation works in a similar way to increase sxy translatability.

Finally, we have the infamous murE point mutation. Competence gene expression increases a bit from B1 to B2. Apparently competence genes are expressed more in murE in sBHI than KW20 in MIV. At the very least, it's possible to say that murE is hypercompetent because something is causing aggressive expression of competence genes (and not some change of membrane permeability, for example).

List of CRP-regulated genes

This data was produced by comparing the CRP knockout strain to KW20 in MIV only. This is based on one crpx timecourse replicate (I believe another replicate has been sent for sequencing).

First I compiled a list of predicted CRP binding sites and associated them with genes predicted to be first in an operon (using data from DOOR and ProOpDB and confirmation by looking for similar expression of neighbouring genes in the RNA-seq data).

I pulled out genes that were very evidently CRP regulated (were differentially expressed, had CRP site). I decided on a cut-off using this list. I let all genes that were significant (padj < 0.1) in at least 2 out of 4 timepoints be considered truly significant. This reduced a lot of likely false positives (for example, a lot of genes were significant at the last timepoint which does not follow expected behaviour from CRP-regulated genes).

I uploaded the results to the Google Drive:
Scott/DE Results/CRP genes 1.xlsx

This file shows the raw differential expression analysis results on the first sheet. The second sheet shows the raw data of the genes that passed the cut-off I described above. The third sheet shows a prettier version with fold changes instead of log-fold changes (these are compared to KW20) and asterisks to represent significance. Genes are grouped by operons. The fourth sheet shows the predicted CRP binding sites and the last sheet is a simple list of genes that are directly (with a CRP site) and indirectly (could not find a strong CRP site) regulated by CRP. I'll also post some of the results here:

GENES WITH CRP-N SITES

ID	Gene	CRP site 1	CRP site 2	CRP site 3
HI0035		AAATGTGACGAACGTATCATTT

HI0036		AAATGTGACGAACGTATCATTT

HI0053		TTTTGTGATATGGCTCACAAAA
HI0052
HI0051
HI0050m
HI0049	kdgK
HI0048
HI0047	eda

HI0075	nrdD	TAATTTGATATTTTTCTAATAA	TATTGATCACAAAATCAAAAAT	CATTGTGATATTGATCACAAAA

HI0082		TTTCTTGATCCACGTCACATTA
HI0083

HI0131	afuA	TATTATGAAATTCAACAAAATT	AACTGTGAACTTCATCACGGTA
HI0129	afuB
HI0126	fbpC

HI0145		AAATGAGAAGTTGATCACATTT
HI0144
HI0143
HI0142	nanA

HI0146		AAATGTGATCAACTTCTCATTT
HI0147

HI0289	sdaC	AAATTTTAACTTGATCACAATT
HI0288	sdaA

HI0398		TTTTGTGACTCACTTCAAACTC
HI0399	icc

HI0501	rbsD	TTTTGTGATCAATATCCCAAAT
HI0502	rbsA
HI0503	rbsC
HI0504	rbsB

HI0521		AACTGTGATCTTCCTCACGTTT
HI0520

HI0534	aspA	AAATGTGATCTTCATCAAGTTT

HI0591	speF	TATTATGCCAAATTTAAAAATT
HI0590	potE

HI0601	tfoX	ATTTACGATCTGGCTCACAAAT

HI0604	cyaA	ATTTACGATCTGGCTCACAAAT
HI0605	gpsA
HI0606	cysE
HI0607	aroE

HI0608		TTTGTTGCTCTCGATCACATTT

HI0685	glpA	TATTGTGATCAATATCACAAAA	AAATGTGAAGTGTTTCACAAAT
HI0684	glpB
HI0683	glpC

HI0740	yhxB	AAATGTTAAGTAGATCAAAAAA

HI0745	ansB	TTATGTGATCGAGATCATAAAT

HI0804		TTTTGTTAAACACTTCACATTT	AATATTTATCTAGTTCAAAATT

HI0809	pckA	AAATGAGATCTACTTAACATTT	ATTTTTGCTCTATATCACAATA

HI0815	uspA	AATTGTGATCTAGTACACAGTT

HI0822	mglB	ATTTGTGACATGGATCACAAAT
HI0823	mglA
HI0824	mglC

HI0835	frdA	TTTTTTGAGGTAGATCACAAAA
HI0834	frdB
HI0833	frdC
HI0832	frdD

HI0884	arcA	AACTATGATTTAGATCACAAAA

HI1010		TTCTGTGATCTAGATCTCAGAT
HI1011
HI1012
HI1013
HI1014
HI1015	gntP
HI1016

HI1111	xylF	AAATAGGATCTAGATCACAAAA
HI1110	xylG
HI1109	xylH

HI1031		AAATAGGATCTAGATCACAAAA
HI1030
HI1029
HI1028
HI1027	lyx
HI1026
HI1025	sgbE
HI1024	ulaD

HI1089	ccmA	AAATAGGATCTAGATCACAAAA
HI1090	ccmB
HI1091	ccmC
HI1092	ccmD
HI1093	ccmE
HI1094	ccmF
HI1095	dsbE
HI1096m	ccmH
HI1097m

HI1126.1		AAATGTGATACAAGTCACAAAT

HI1210	mdh	AAATGTGAACTAGATCATAGAA

HI1218	lctP	TTATGAGATATTGATCACATTT

HI1245		AAGTTTGCAGTTCGTCACAATT

HI1350	cdd	ATAAGTGATCAAGATCACAGTT

HI1356	malQ	ATTATTGACGAAGATCACACTT
HI1357	glgB
HI1358	glgX
HI1359	glgC
HI1360	glgA

HI1398	fumC	TTTTATGATCTATGTCACAAAA

HI1427		TTTTGTGATCTCGATCACAAAT

HI1434.1	cspD	AAAATTGATTTAGATCATTAAA

HI1645	fbp	AAAATTGATTTAGATCATTAAA

HI1662	sucA	AAAATTGATTTAGATCATTAAA
HI1661	sucB

OTHER GENES

Competence genes

HI0061	rec2

HI0299
HI0298
HI0297
HI0296	hopD

HI0365
HI0366

HI0985	dprA

HI1008

HI0439	comA
HI0438	comB
HI0437	comC
HI0436	comD
HI0435	comE
HI0434	comF

HI0660
HI0659
HI0658

HI0938
HI0939
HI0940
HI0941

HI0952	radC

HI1117	comM

HI1183

trp genes

HI0287	mtr

HI0830	trpR

HI1387	trpE
HI1388	trpG
HI1388.1
HI1389	trpD
HI1389.1	trpC
HI1390	hybG

HI1430
HI1431	trpB
HI1432	trpA

fuc genes

HI0614	fucI
HI0613	fucK
HI0612	fucU

Other

HI0141	nagB
HI0140	nagA

HI0148

HI0300	ampD

HI0410	tyrR

HI0584

HI0623	fmt

HI0738	ilvD

HI0764	ribB

HI0956

HI1056

HI1434	ybaK

HI1456
HI1457

HI1492

HI1537	licA
HI1538	licB
HI1539	licC
HI1540	licD

HI1655

HI1664

HI1682	sohB

UPREGULATED:

HI0035, HI0047 (eda), HI0048, HI0049 (kdgK), HI0050m, HI0051, HI0052, HI0053, HI0061 (rec2),
HI0075 (nrdD), HI0082, HI0083, HI0126 (fbpC), HI0129 (afuB), HI0131 (afuA), HI0140 (nagA),
HI0141 (nagB), HI0142 (nanA), HI0143, HI0144, HI0145, HI0146, HI0147, HI0148, HI0288 (sdaA), HI0289 (sdaC), HI0296 (hopD), HI0297, HI0298, HI0299, HI0365, HI0366, HI0398, HI0399 (icc), HI0410 (tyrR), HI0434 (comF), HI0435 (comE), HI0436 (comD), HI0437 (comC), HI0438 (comB), HI0439 (comA), HI0501 (rbsD), HI0502 (rbsA), HI0503 (rbsC), HI0504 (rbsB), HI0520, HI0521, HI0534 (aspA), HI0590 (potE), HI0591 (speF), HI0601 (tfoX), HI0608, HI0612 (fucU), HI0613 (fucK), HI0614 (fucI), HI0623 (fmt), HI0658, HI0659, HI0660, HI0683 (glpC), HI0684 (glpB), HI0685 (glpA), HI0740 (yhxB), HI0745 (ansB), HI0804, HI0809 (pckA), HI0815 (uspA), HI0822 (mglB), HI0823 (mglA), HI0824 (mglC), HI0832 (frdD), HI0833 (frdC), HI0834 (frdB), HI0835 (frdA), HI0884 (arcA), HI0938, HI0939, HI0940, HI0941, HI0952 (radC), HI0985 (dprA), HI1008, HI1010, HI1011, HI1012, HI1013, HI1014, HI1015 (gntP), HI1016, HI1024 (ulaD), HI1025 (sgbE), HI1026, HI1027 (lyx), HI1028, HI1029, HI1030, HI1031, HI1110 (xylG), HI1111 (xylF), HI1117 (comM), HI1126.1, HI1183, HI1210 (mdh), HI1218 (lctP), HI1245, HI1350 (cdd), HI1356 (malQ), HI1357 (glgB), HI1358 (glgX), HI1359 (glgC), HI1360 (glgA), HI1398 (fumC), HI1427, HI1434.1 (cspD), HI1456, HI1457, HI1537 (licA), HI1538 (licB), HI1539 (licC), HI1540 (licD), HI1645 (fbp), HI1661 (sucB), HI1662 (sucA)

DOWNREGULATED:

HI0036, HI0300 (ampD), HI0584, HI0604 (cyaA), HI0605 (gpsA), HI0606 (cysE), HI0607 (aroE), HI0738 (ilvD), HI0956, HI1056, HI1089 (ccmA), HI1090 (ccmB), HI1091 (ccmC), HI1092 (ccmD), HI1093 (ccmE), HI1094 (ccmF), HI1095 (dsbE), HI1096m (ccmH), HI1097m, HI1434 (ybaK), HI1492, HI1655, HI1664, HI1682 (sohB)