Understand the blat and blat output

Here is the deal, I need to parse the “blat” output with two, three, and even four blocks to figure out where the deletion is exactly located:

Sample file format
match    T gap   strand          block   blockSizes      qStarts          tStarts
         bases                   count
298      9962    +       3       69,205,24,      1,70,275,       4132,14162,14368,
299      63      -       3       5,107,188,      0,5,112,        2932,2999,3107,

298      1       +       2       136,163,        1,137,          2970,3107,
213      1       -       2       172,42,         82,257,         137,310,


4-blocks are yet to be implemented in the future

For 2-block positive strand case

298      1       +       2       136,163,        1,137,          2970,3107,
298      1       +       2       147,152,        1,148,          2299,2447,
299      96      +       2       293,6,          1,294,          14350,14739,

Solution is:

3107 - 2970 - 136  = 1   --> [3107,  3107] 

DelStart =  2970 + 136 + 1 [3107]
DelEnd:  =  2970 + 136 + 1 [3107]
# Match stops at 3106 (2970 + 136), with 3107 missing, another match starts at 3108


2447 - 2299 - 147  = 1   --> [2447,  2447] 
DelStart =  2299 + 147 + 1 [2447]
DelEnd:  =  2299 + 147 + 1 [2447]

# Match stops at 2446 (2299 + 147), with 2447 missing, another match starts at 2448

14739 - 14350 - 293  = 96   --> [14644,  14739] 
DelStart =  14350 + 293 + 1   [14644]
DelEnd:  =  14350 + 293 + 96  [14739]

# Match stops at 14643 (14350 + 293 ), with [14644-14739] missing, another match starts at 14740

For 2-block negative strand case, same as POSITIVE strand !!!

298      402     -      2       7,293,          0,7,    263,672,
#Query:@HWI-M01825:53:000000000-A8MJM:1:1101:16256:1461      

672 - 7 - 263 = 402  --> [264, 270]  [271, 672] 

# Match starts at 264 and stops at 270 with 7 bps aligned, breaks [271, 672] , 
match start at 673
# As a result ==> break starts at 271 (= 263 + 7 + 1), with 402 long deletion
Therefore:
DelStart: 263 + 7 + 1 [271]
DelEnd:   263 + 7 +  402  [672] 

BLAT results details as following:
----------------------------------------------------------------

Side by Side Alignment


0001 ggggagggggtgatctaaaacactctttacgccggcttctattgacttgg 0050
<<<< |||||||||||||||||||||||||||||||||||||||||||||||||| <<<<
0965 ggggagggggtgatctaaaacactctttacgccggcttctattgacttgg 0916

0051 gttaatcgtgtgaccgcggtggctggcacgaaattgaccaaccctggggt 0100
<<<< |||||||||||||||||||||||||||||||||||||||||||||||||| <<<<
0915 gttaatcgtgtgaccgcggtggctggcacgaaattgaccaaccctggggt 0866

0101 tagtatagcttagttaaactttcgtttattgctaaaggttaatcactgct 0150
<<<< |||||||||||||||||||||||||||||||||||||||||||||||||| <<<<
0865 tagtatagcttagttaaactttcgtttattgctaaaggttaatcactgct 0816

0151 gtttcccgtgggggtgtggctaggctaagcgttttgagctgcattgctgc 0200
<<<< |||||||||||||||||||||||||||||||||||||||||||||||||| <<<<
0815 gtttcccgtgggggtgtggctaggctaagcgttttgagctgcattgctgc 0766

0201 gtgcttgatgcttgtcccttttgatcgtggtgatttagagggtgaactca 0250
<<<< ||||||||||||||| |||||||||||||||||||||||||||||||||| <<<<
0765 gtgcttgatgcttgttccttttgatcgtggtgatttagagggtgaactca 0716

0251 ctggaacgggggtgcttgcatgtgtaatcttactaagagctaa 0293
<<<< ||||||||||| ||||||||||||||||||||||||||||||| <<<<
0715 ctggaacggggatgcttgcatgtgtaatcttactaagagctaa 0673

-------------------------------------------------------------------
0294 tggaaag 0300
<<<< ||||||| <<<<
0270 tggaaag 0264

-------------------------------------------------------------------

More on negative cases:

213      1       -       2       172,42,         82,257,         137,310,
298      1       -       2       37,262,         0,37,   3069,3107,
298      222     -       2       293,6,          1,294,          11425,11940,
298      4939    -       2       239,60,         1,240,          717,5895,


310   -  172 -137     =  1   -->  [138, 309]     
==> break starts at 310   (137   + 172 + 1 ) with 1 bp long deletion
DelStart: 137   + 172 + 1 [310]
DelEnd:   137   + 172 + 1 [310]

3107  - 37  -3069     =  1   -->  [3070, 3106]   
==> break starts at 3107  (3069  + 37  + 1) with 1 bp long deletion
DelStart: 3069 + 37  + 1 [3107]
DelEnd:   3069 + 37  + 1 [3107]

11940 - 293 - 11425   = 222  -->  [11426, 11939] 
==> braak starts at 11719 (11425 + 293 + 1) with 222 bps long deletion
DelStart: 11425 + 293 + 1    [11719]
DelEnd:   11425 + 293  + 222 [11940]

5895  - 239 - 717     = 4939 -->  [718,   5894 ] 
==> break starts at 957   (717   + 239 + 1) with 4939 bps long deletion
DelStart: 717 + 239 + 1    [957]
DelEnd:   717 + 239  + 4939 [5895]


Solution is: there was one-base off !!!



Solution is:

Sample read:

Hs_Mito_ref:
Case: 3-blocks positive strand

298      9962    +       3       69,205,24,      1,70,275,       4132,14162,14368,

                                                                 Score    E
Sequences producing significant alignments:                      (bits) Value

gi|251831106|ref|NC_012920.1|_ChrM                                    437   e-122
gi|251831106|ref|NC_012920.1|_ChrM                                    135   7e-32



>gi|251831106|ref|NC_012920.1|_ChrM 
          Length = 16569

 Score = 437 bits (1127), Expect = e-122
 Identities = 229/230 (100%)
 Strand = Plus / Plus

Query: 71    caatctcaattacaatatatacaccaacaaacaatgttcaaccagtaactactactaatc 130
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 14163 caatctcaattacaatatatacaccaacaaacaatgttcaaccagtaactactactaatc 14222

Query: 131   aacgcccataatcatacaaagcccccgcaccaataggatcctcccgaatcaaccctgacc 190
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 14223 aacgcccataatcatacaaagcccccgcaccaataggatcctcccgaatcaaccctgacc 14282

Query: 191   cctctccttcataaattattcagcttcctacactattaaagtttaccacaaccaccaccc 250
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 14283 cctctccttcataaattattcagcttcctacactattaaagtttaccacaaccaccaccc 14342

Query: 251   catcatactctttcacccacagcac-aatcctacctccatcgctaacccc 299
             ||||||||||||||||||||||||| ||||||||||||||||||||||||
Sbjct: 14343 catcatactctttcacccacagcaccaatcctacctccatcgctaacccc 14392


 Score = 135 bits (348), Expect = 7e-32
 Identities = 69/69 (100%)
 Strand = Plus / Plus

Query: 2    catacccccgattccgctacgaccaactcatacacctcctatgaaaaaacttcctaccac 61
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 4133 catacccccgattccgctacgaccaactcatacacctcctatgaaaaaacttcctaccac 4192

Query: 62   tcaccctag 70
            |||||||||
Sbjct: 4193 tcaccctag 4201

Another 3 block examples with both breaks greater than 15

                                                                 Score    E
Sequences producing significant alignments:                      (bits) Value

gi|251831106|ref|NC_012920.1|_ChrM                                    472   e-133
gi|251831106|ref|NC_012920.1|_ChrM                                     92   7e-19
gi|251831106|ref|NC_012920.1|_ChrM                                     14   2e+05



>gi|251831106|ref|NC_012920.1|_ChrM 
          Length = 16569

 Score = 472 bits (1219), Expect = e-133
 Identities = 244/246 (99%)
 Strand = Plus / Plus

Query: 48    caattctccgatccgtccctaacaaactaggaggcgtccttgccctattactatccatcc 107
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 15582 caattctccgatccgtccctaacaaactaggaggcgtccttgccctattactatccatcc 15641

Query: 108   tcatcctagcaataatccccatcctccatatatccaaacaacaaagcataatacttcgcc 167
             ||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||
Sbjct: 15642 tcatcctagcaataatccccatcctccatatatccaaacaacaaagcataatatttcgcc 15701

Query: 168   cactaagccaatcactttattgactcctagccgcagacctcctcattctaacctgaatcg 227
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 15702 cactaagccaatcactttattgactcctagccgcagacctcctcattctaacctgaatcg 15761

Query: 228   gaggacaaccagtaagctacccttttaccatcattggaccagtagcatccgtactatact 287
             ||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||
Sbjct: 15762 gaggacaaccagtaagctacccttttaccatcattggacaagtagcatccgtactatact 15821

Query: 288   tcacaa 293
             ||||||
Sbjct: 15822 tcacaa 15827


 Score = 92 bits (237), Expect = 7e-19
 Identities = 47/47 (100%)
 Strand = Plus / Plus

Query: 1    catcccgatggtgcagccgctattaaaggttcgtttgttcaacgatt 47
            |||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 3004 catcccgatggtgcagccgctattaaaggttcgtttgttcaacgatt 3050


 Score = 14 bits (36), Expect = 2e+05
 Identities = 7/7 (100%)
 Strand = Plus / Plus

Query: 294   cctgtcc 300
             |||||||
Sbjct: 15885 cctgtcc 15891

Case: 3-blocks negative strand


296      10788   -       3       113,182,5,      0,113,295, 5320,16167,16403,


Sequences producing significant alignments:                      (bits) Value

gi|251831106|ref|NC_012920.1|_ChrM                                    347   1e-95
gi|251831106|ref|NC_012920.1|_ChrM                                    216   2e-56
gi|251831106|ref|NC_012920.1|_ChrM                                     10   3e+06



>gi|251831106|ref|NC_012920.1|_ChrM 
          Length = 16569

 Score = 347 bits (895), Expect = 1e-95
 Identities = 179/182 (98%)
 Strand = Minus / Plus

Query: 187   ccaatccacatcaaaaccccctcctcatgcttacaagcaagtacagcaatcaaccctcaa 128
             |||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||
Sbjct: 16168 ccaatccacatcaaaaccccctccccatgcttacaagcaagtacagcaatcaaccctcaa 16227

Query: 127   ctatcacacatcaactgcaactccaaagtcacccctcacccattaggataccaacaaacc 68
             |||||||||||||||||||||||||||| ||||||||||||| |||||||||||||||||
Sbjct: 16228 ctatcacacatcaactgcaactccaaagccacccctcacccactaggataccaacaaacc 16287

Query: 67    tacccacccttaacagtacatagtacataaagccatttaccgtacatagcacattacagt 8
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 16288 tacccacccttaacagtacatagtacataaagccatttaccgtacatagcacattacagt 16347

Query: 7     ca 6
             ||
Sbjct: 16348 ca 16349


 Score = 216 bits (558), Expect = 2e-56
 Identities = 112/113 (99%)
 Strand = Minus / Plus

Query: 300  catcaccctccttaacctttacttctacctacgcctaatctactccacctcaatcacact 241
            |||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||
Sbjct: 5321 catcaccctccttaacctctacttctacctacgcctaatctactccacctcaatcacact 5380

Query: 240  actccccatatctaacaacgtaaaaataaaatgacagtttgaacatacaaaac 188
            |||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 5381 actccccatatctaacaacgtaaaaataaaatgacagtttgaacatacaaaac 5433


 Score = 10 bits (25), Expect = 3e+06
 Identities = 5/5 (100%)
 Strand = Minus / Plus

Query: 5     catcc 1
             |||||
Sbjct: 16404 catcc 16408

Case: 3-blocks negative strand — bad example

275      13167   -   3       82,132,62,      24,106,238,     3024,3107,16405,

Output:

BLASTN 2.2.11 [blat]

Reference:  Kent, WJ. (2002) BLAT - The BLAST-like alignment tool

Query= @HWI-M01825:53:000000000-A8MJM:1:1101:23159:10038
         (300 letters)

Database: ../../../reference/hs_mit_seq.fasta 
           1 sequences; 16,569 total letters

Searching.done
                                                                 Score    E
Sequences producing significant alignments:                      (bits) Value

gi|251831106|ref|NC_012920.1|_ChrM                                    405   e-113
gi|251831106|ref|NC_012920.1|_ChrM                                    122   6e-28



>gi|251831106|ref|NC_012920.1|_ChrM 
          Length = 16569

 Score = 405 bits (1044), Expect = e-113
 Identities = 213/215 (99%)
 Strand = Minus / Plus

Query: 276  attaaaggttcgtttgttcaacgattaaagtcctacgtgatctgagttcagaccggagta 217
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 3025 attaaaggttcgtttgttcaacgattaaagtcctacgtgatctgagttcagaccggagta 3084

Query: 216  atccaggtcggtttctatctac-ttcaaattcctccctgtacgaaaggacaagagaaata 158
            |||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||
Sbjct: 3085 atccaggtcggtttctatctacnttcaaattcctccctgtacgaaaggacaagagaaata 3144

Query: 157  aggcctacttcacaaagcgccttcccccgtaaatgatatcatctcaacttagcattatac 98
            |||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||
Sbjct: 3145 aggcctacttcacaaagcgccttcccccgtaaatgatatcatctcaacttagtattatac 3204

Query: 97   ccacacccacccaagaacagggtttgttaagatgg 63
            |||||||||||||||||||||||||||||||||||
Sbjct: 3205 ccacacccacccaagaacagggtttgttaagatgg 3239


 Score = 122 bits (315), Expect = 6e-28
 Identities = 62/62 (100%)
 Strand = Minus / Plus

Query: 62    tcctccgtgaaatcaatatcccgcacaagagtgctactctcctcgctccgggcccataac 3
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 16406 tcctccgtgaaatcaatatcccgcacaagagtgctactctcctcgctccgggcccataac 16465

Query: 2     ac 1
             ||
Sbjct: 16466 ac 16467

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.