BWA alignment faliure

It occurred when I tried to align miRNAseq data to miRBase. The raw sequence was pre-processed by removing the RNA PCR Primer sequence using fastx-toolkit. The error message is like the following:

[bns_restore_core] fail to open file '/ddn/gs1/home/li11/refDB/miRBase/hsa_combined.fa.nt.ann'. Abort!

What does it mean?

I think the error occurred owing to “no” alignment being reported. As the miRBase is too short and no aligner will be able to handle the following case:



    The number of 'N' characters to append at the 5'-end and at the 3'-end of each entry. This may be useful for mapping as most mappers can not match reads which start before the reference sequence entry. For example

     Mapping to cel-let-7 MIMAT0000001 Caenorhabditis elegans let-7
     TTGAGGTAGTAGGTTGTATAGTTA ..read
      TGAGGTAGTAGGTTGTATAGTT  ..mirbase entry

    This sequence can not be mapped to the corresponding mirbase entry with any Mapper. When using GEM instead it is possible to map this read with two mismatches. It is however necessary to append several 'N's at the 5'-end and 3'-end first. For example:

     Mapping to cel-let-7 MIMAT0000001 Caenorhabditis elegans let-7
      TTGAGGTAGTAGGTTGTATAGTTA   ..read
     NNTGAGGTAGTAGGTTGTATAGTTNN  ..mirbase entry

    Now the read could be mapped to the reference sequence (let-7). default=0

Solutions

Step 1 Count the shortest miRbase sequence
Step 2 Append Ns to both 5' and 3' ends

I had alignment from BWA, I noticed very strange phenomena. With sam format -flag 4, it still showed “match” in cIGAR string:

64NDWAAXX:5:079:19712:16612 4 hsa-miR-19b-1-5p_MIMAT0004491_Homo_sapiens_miR-19b-1-5p 45 25 14M * 0 0 CCATGACCGCTCTG 0A+>:<*A>?>A X0:
i:1 X1:i:0 MD:Z:9G4 XG:i:0 NM:i:1 XM:i:1 XN:i:1 XO:i:0 XT:A:U

Here is what is true:

[li11@ehscmplp09/bioinfo1 perlLibs]$ perl /ddn/gs1/home/li11/perlLibs/seqManip.pl /ddn/gs1/home/li11/refDB/miRBase/hsa_mature_ready_to_map.fa  hsa-miR-19b-1-5p_MIMAT0004491_Homo_sapiens_miR-19b-1-5p 

Output:
Seen sequence hsa-miR-19b-1-5p_MIMAT0004491_Homo_sapiens_miR-19b-1-5p, with seq as NNNNNNNNNNNAGTTTTGCAGGTTTGCATCCAGCNNNNNNNNNNN

And,

[li11@ehscmplp09/bioinfo1 perlLibs]$ perl searchFastqSeq.pl  ~/project2013/miRNAseq/processedSeq/TQE_14_24/ES583_trimmed_1mismatch.minQS20.fastq.14_24.fastq  @64NDWAAXX:5:079:19712:16612

Output:
Seen sequence @64NDWAAXX:5:079:19712:16612, with seq as CCATGACCGCTCTG

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.