I just wanted to record a few things that may be interesting. The results can be found here. If those results expire, I also put them on pastebin here.
Of interest is this result:
start stop strand score p-value q-value matched sequence
897196 897217 + 15.1 8.7e-08 0.013 TACTGCGATTTAGATCGCAAAC
This fall in a region between HI0850 and HI0851 (mobB). The microarray paper we did mentions that this is a potential CRP-S site but claims it is too far away from mobB to regulated it's transcription. But why should I believe that the closest gene is mobB? In my thesis I reported this region as a candidate small RNA based on expression from our RNA seq data (which means something's being transcribed here, it could encode a protein).
Doing a BLAST-N search (genome id: gi|16271976, from: 897186 to: 897517) returns results from mostly Haemophilus influenzae and Mannheimia haemolytica and some from Aggregatibacter actinomycetemcomitans and one Haemophilus bacteriophage: Aaphi23
There is a region ~100 bp that is aligned between these sequences (presumably some functional part) and in the H. influenzae strains KR494 and R2866, the CRP-S site also appears to be conserved:
Also in the KR494 genome, there appears this region appears to be annotated as a 73 amino acid protein sequence. Grabbing that sequence in running it though PFAM gets me a single PFAM-b result that isn't very helpful.
In summary, it looks like there is something beside mobB with a good CRP-S site. This is something that will probably require some groundwork biological data. I'll be sure to check in the RNA-seq data if it's expression is Sxy-dependent.