Thursday, January 8, 2015

Result of the BLAST searches on that problem file

Here's the command I used (taken from Scott's example):

 gunzip -cd 9C30_CGTACG_L004_R1_001.fastq.gz | head -32

Here are the first 8 reads:
  1. GCCCGATGCGGAAACCGGTGAAGGCCTGAAGAATCTCGATTACGCCTATATCGCCGCCGAGCTCGGCAAGAACCCGCTCGCCCCCCAGACGATGAACTGCT
  2. AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
  3. TCGGCCTTCAAAGTTCTCATTTGAATATTTGCTACTACCACCAAGATCTGCACCGACGGCCGCTCCGCCCGGGCTCGCGCCCCGGGGTTTGCAGCGGCCGC
  4. CCCACCTATCCTACACTCAGTTGCACAAAAGGATGTGAACAAGACCTGGTAATTACAGCATTATAAACACACACACACATGCTCGCAAACCTCAAAAATGA
  5. CTACTCTTGACATCCTAAGAAGAGCTCAGAGATGAGCTTGTGCCTTCGGGAACTTAGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGT
  6. CCTTTCAACAATTTCACGTACTTTTTCACTCTCTTTTCAAAGTTCTTTTCATCTTTCCATCACTGTACTTGTTCGCTATCGGTCTCTCGCCAATATTTAGC
  7. AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
  8. AATAATATGGTTTGGTTGTTTGTCCCCTTCAAATCTCATGTTGAAATATCATCCCGTGTTGGAGGTGGGGCCTGGTGGAAGGTGTTTGGATCATGGGGATA

OK, two are just strings of As, consistent with what Scott saw.  Let's see what the others are and ow closely they match the best BLAST match: 
1. Rhodopseudomonas palustris HaA2, complete genome 100%
3. Lots of strong matches, probably due to the repeats
4. Homo sapiens chromosome 11, clone RP11-324K6, complete sequence 100%
5. Haemophilus influenzae strain Hi375, complete genome 100%
6. Saccharomyces cerevisiae strain NDL373 26S ribosomal RNA gene, partial sequence 100%
8. Human DNA sequence from clone RP13-346H10 on chromosome X, complete sequence 100%

Here are the next 8 reads:
9.AAAATAGAAAGACCTGTCCCTGTAGAAAACCCCCTGAGCAAAGTAGTCTCTGGAAACCGTCCCTCACCCTCATGTGGAAGTGTGGGTGGGGGGATATTGGT

10.CNCGTTGCCACCATGGTAGGCCACTATCCTACCATCGACAGTTGATAGGGCAGAAATTTGAATGAACCATCGCCGGCGTAAAGCCATGCGATTCGACAAGT

11.CGCCCAGCTAATTTTTGTATTTTTAGTAGAGTCAGGGTTTCACCATGTTAGCCAGGCTGGTCTCGAACGCCTGACCTCAGGTGATCCACCTGCCTCAGGCT

12.ACCCTTCCCCCCTTACCTCAATGGCTGCCTCCCTCACACCCTTCAGCACTCCTCATATGTGTCATCGGCTTCTCCATGAGGCCTCCCTGGGCCCCCCCCCC

13.CGGGTCACTATGACCTACTTTCGTACCTGCTCGACTTGTCTGTCTCGCAGTTAAGCTTGCTTATACCATTGCACTAACCTCACGATGTCCGACCGTGATTA

14.TTTCCAGCAGCATGTACGGGGTGCCGGTGCTCAGCTGGGTCTTCAGCACAGCCCAGTGGTCCTGCAGGGTGATGCCGGCCACCACTTCGCAGCCCACGGCC

15.CCAAGCCATACCTGATCCTTGATCTAAGGAAAATGGGAGCTGATAAGTGTGTGCTGTTTTCAGCTGCCAAGTGTGGGACCCTTTGTGATGCTGGAGTCAGT

16.TGGGACGTAATCAACGCAAGCTGATGACTTGCGCTTACTAGGGATTCCTCGTTGAAGAGCAATAATTGCAATGCTCTATCCCCAGCACGACAGAGTTTAAC

And their BLAST results:
9. Human chromosome 14 DNA sequence 100%
12. Homo sapiens 12p11-37.2-54.4 BAC RP11-1110J8 100%
15. Homo sapiens 12 BAC RP13-714J12  100%

Conclusion:  The problem isn't that the original RNA sample was contaminated with another species of bacteria, or that the rRNA removal step didn't work.  But it might have been that there was so little mRNA that contamination made a much larger contribution than normal.

I guess this is good news.  There's no reason to discard this sample, we'll just have fewer reads to work with than in the other samples.  

No comments: