Understand BWA SAM file format — to get the “perfect match” alignment

A good wiki site to start off with

The same question has been asked before. Answer? don’t know

Another post on seqanswers comparing different aligners.

SAM file tag format numeric tag explained.

And, picard provides online sam tag check

SAM format SAM_format helps to extract “perfect match”.

With special tags to BWA alignment here

I got help from our collaborator, Florian my own note

To get all flag counts:

samtools view  ES583_miRbaseMature_bwa.bam | grep -v "@" | awk -F"\t" 'BEGIN{print "flag\toccurrences"} {a[$2]++} END{for(i in a)print i"\t"a[i]}'
flag    occurrences
4       2109951
20      1096
0       1190274
16      10244

Now, let’s focus on flag 0

Scenario I

NNNNNNNNNNN TGAGGTAGTAGGTTGTATAGTTNNNNNNNNNNN
pppppppppppGTGAGGTAGTAGGTTGTATAGTT

With one base off : NM:i:1
With one mismatch in the alignment : XM:i:1
With one ambiguous base in the reference: XN:i:1

Scenario II, same as I, but reported differently by BWA

NNNNNNNNNNN TGAGGTAGTAGGTTGTATAGTTNNNNNNNNNNN
pppppppppppGTGAGGTAGTAGGTTGTATAGTT (in scenario I)
pppppppppppTTGAGGTAGTAGGTTGTATAGTT

With no base off : NM:i:0
With no mismatch in the alignment : XM:i:0
With one ambiguous base in the reference: XN:i:1

Scenario III, same as I, but shifted to the right

NNNNNNNNNNN TGAGGTAGTAGGTTGTATAGTTNNNNNNNNNNN
pppppppppppGTGAGGTAGTAGGTTGTATAGTT (in scenario I)
ppppppppppp TGAGGTAGTAGGTTGTATAGTTT

With one base off : NM:i:1
With one mismatch in the alignment : XM:i:1
With one ambiguous base in the reference: XN:i:1

Scenario IV, a perfect case

NNNNNNNNNNNTGAGGTAGTAGGTTGTATAGTTNNNNNNNNNNN
pppppppppppTGAGGTAGTAGGTTGTATAGTT

With no base off : NM:i:0
With no mismatch in the alignment : XM:i:0
With no ambiguous base in the reference: XN:i:0

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.