It occurred when I tried to align miRNAseq data to miRBase. The raw sequence was pre-processed by removing the RNA PCR Primer sequence using fastx-toolkit. The error message is like the following:
[bns_restore_core] fail to open file '/ddn/gs1/home/li11/refDB/miRBase/hsa_combined.fa.nt.ann'. Abort!
What does it mean?
I think the error occurred owing to “no” alignment being reported. As the miRBase is too short and no aligner will be able to handle the following case:
The number of 'N' characters to append at the 5'-end and at the 3'-end of each entry. This may be useful for mapping as most mappers can not match reads which start before the reference sequence entry. For example Mapping to cel-let-7 MIMAT0000001 Caenorhabditis elegans let-7 TTGAGGTAGTAGGTTGTATAGTTA ..read TGAGGTAGTAGGTTGTATAGTT ..mirbase entry This sequence can not be mapped to the corresponding mirbase entry with any Mapper. When using GEM instead it is possible to map this read with two mismatches. It is however necessary to append several 'N's at the 5'-end and 3'-end first. For example: Mapping to cel-let-7 MIMAT0000001 Caenorhabditis elegans let-7 TTGAGGTAGTAGGTTGTATAGTTA ..read NNTGAGGTAGTAGGTTGTATAGTTNN ..mirbase entry Now the read could be mapped to the reference sequence (let-7). default=0
Solutions
Step 1 Count the shortest miRbase sequence Step 2 Append Ns to both 5' and 3' ends
I had alignment from BWA, I noticed very strange phenomena. With sam format -flag 4, it still showed “match” in cIGAR string:
64NDWAAXX:5:079:19712:16612 4 hsa-miR-19b-1-5p_MIMAT0004491_Homo_sapiens_miR-19b-1-5p 45 25 14M * 0 0 CCATGACCGCTCTG 0A+>:<*A>?>A X0:
i:1 X1:i:0 MD:Z:9G4 XG:i:0 NM:i:1 XM:i:1 XN:i:1 XO:i:0 XT:A:U
Here is what is true:
[li11@ehscmplp09/bioinfo1 perlLibs]$ perl /ddn/gs1/home/li11/perlLibs/seqManip.pl /ddn/gs1/home/li11/refDB/miRBase/hsa_mature_ready_to_map.fa hsa-miR-19b-1-5p_MIMAT0004491_Homo_sapiens_miR-19b-1-5p Output: Seen sequence hsa-miR-19b-1-5p_MIMAT0004491_Homo_sapiens_miR-19b-1-5p, with seq as NNNNNNNNNNNAGTTTTGCAGGTTTGCATCCAGCNNNNNNNNNNN
And,
[li11@ehscmplp09/bioinfo1 perlLibs]$ perl searchFastqSeq.pl ~/project2013/miRNAseq/processedSeq/TQE_14_24/ES583_trimmed_1mismatch.minQS20.fastq.14_24.fastq @64NDWAAXX:5:079:19712:16612 Output: Seen sequence @64NDWAAXX:5:079:19712:16612, with seq as CCATGACCGCTCTG