Wednesday, June 3, 2015

Phase variable genes are a pain

Since I've started looking at the RNA-seq data, there are a few genes that periodically show up as very differentially expressed. Trying to figure out what is going on has been rough, but I think I finally understand. These genes appear to be phase variable.

HI1537 (licA), HI1538 (licB), HI1539 (licC), HI1540 (licD)
has CAAT repeats:

Appears to be strongly downregulated in crpx only
in crpx:

in kw20 (and what seems to be everything other than crpx):

Note: that black segments corresponds to a 4bp deletion. There appears to be an additional repeat in the crpx strain that decreases the transcript abundance.

 HI1287 (hsdM), HI1286 (hsdS), HI1285 (hsdR)
has AGCAG repeats:

Appears to be strongly upregulated in taxx only
in taxx (note the 5 bp deletion):

in kw20 (and what seems to be everything other than taxx):

It looks like the extra repeat in all the strain other than taxx causes a huge difference.

 HI0354 (lic3A)
has AACT repeats:

Note: these repeats go on for more than 100 basepairs. Since we use reads of length 101 I don't think it's possible to detect a deletion here. This is the downside of using short reads.

But this gene is highly upregulated in KW20 in sBHI

Compare kw20:

versus murE at the same timepoint:

I have a good reason that phase variation is also responsible for this difference.

And finally: 
 HI1457 (opa), HI1456 (??)
has no repeats.

This gene pair has been a pain in my side for a while. Mainly because it looks like it is turned on in sBHI only in cells that are hypercompetent:

For strains in BHI, there is like a... 1.6% chance of seeing this assuming each strain has a 50% chance of having the gene on. But in MIV, it looks like crpx (and maybe hfqx) are the only ones that don't really express this gene. There is not a clear view of what's going on in this data.

I dug around some papers and found this one: 

The phasevarion: a genetic system controlling coordinated, random switching of expression of multiple genes.

This paper is oddly pertinent to this blog post. The paper shows that the opacity protein opa is regulated by a type III restriction-modification system. HI1058/HI1059 encodes the mod gene that has tetranucleotide (AGTC) repeats. The number of repeats determine the reading frame. Two reading frames produce protein (72 or 86 kDa) and one doesn't. The paper shows that opa expression is reduced under one of these reading frames. There is likely some methylation happening that blocks opa expression.

Unfortunately, I am not able to see which samples have active mod because again, the repeat region is greater than 100 bp. The mod gene does seem to be expressed (significantly) more in the crp knockout though. For now, I think I'll treat opa as indirectly phase variable.

No comments: