Here is the deal, I need to parse the “blat” output with two, three, and even four blocks to figure out where the deletion is exactly located:
Sample file format
match T gap strand block blockSizes qStarts tStarts
bases count
298 9962 + 3 69,205,24, 1,70,275, 4132,14162,14368,
299 63 - 3 5,107,188, 0,5,112, 2932,2999,3107,
298 1 + 2 136,163, 1,137, 2970,3107,
213 1 - 2 172,42, 82,257, 137,310,
4-blocks are yet to be implemented in the future
For 2-block positive strand case
298 1 + 2 136,163, 1,137, 2970,3107, 298 1 + 2 147,152, 1,148, 2299,2447, 299 96 + 2 293,6, 1,294, 14350,14739,
Solution is:
3107 - 2970 - 136 = 1 --> [3107, 3107] DelStart = 2970 + 136 + 1 [3107] DelEnd: = 2970 + 136 + 1 [3107] # Match stops at 3106 (2970 + 136), with 3107 missing, another match starts at 3108 2447 - 2299 - 147 = 1 --> [2447, 2447] DelStart = 2299 + 147 + 1 [2447] DelEnd: = 2299 + 147 + 1 [2447] # Match stops at 2446 (2299 + 147), with 2447 missing, another match starts at 2448 14739 - 14350 - 293 = 96 --> [14644, 14739] DelStart = 14350 + 293 + 1 [14644] DelEnd: = 14350 + 293 + 96 [14739] # Match stops at 14643 (14350 + 293 ), with [14644-14739] missing, another match starts at 14740
For 2-block negative strand case, same as POSITIVE strand !!!
298 402 - 2 7,293, 0,7, 263,672, #Query:@HWI-M01825:53:000000000-A8MJM:1:1101:16256:1461 672 - 7 - 263 = 402 --> [264, 270] [271, 672] # Match starts at 264 and stops at 270 with 7 bps aligned, breaks [271, 672] , match start at 673 # As a result ==> break starts at 271 (= 263 + 7 + 1), with 402 long deletion Therefore: DelStart: 263 + 7 + 1 [271] DelEnd: 263 + 7 + 402 [672] BLAT results details as following: ---------------------------------------------------------------- Side by Side Alignment 0001 ggggagggggtgatctaaaacactctttacgccggcttctattgacttgg 0050 <<<< |||||||||||||||||||||||||||||||||||||||||||||||||| <<<< 0965 ggggagggggtgatctaaaacactctttacgccggcttctattgacttgg 0916 0051 gttaatcgtgtgaccgcggtggctggcacgaaattgaccaaccctggggt 0100 <<<< |||||||||||||||||||||||||||||||||||||||||||||||||| <<<< 0915 gttaatcgtgtgaccgcggtggctggcacgaaattgaccaaccctggggt 0866 0101 tagtatagcttagttaaactttcgtttattgctaaaggttaatcactgct 0150 <<<< |||||||||||||||||||||||||||||||||||||||||||||||||| <<<< 0865 tagtatagcttagttaaactttcgtttattgctaaaggttaatcactgct 0816 0151 gtttcccgtgggggtgtggctaggctaagcgttttgagctgcattgctgc 0200 <<<< |||||||||||||||||||||||||||||||||||||||||||||||||| <<<< 0815 gtttcccgtgggggtgtggctaggctaagcgttttgagctgcattgctgc 0766 0201 gtgcttgatgcttgtcccttttgatcgtggtgatttagagggtgaactca 0250 <<<< ||||||||||||||| |||||||||||||||||||||||||||||||||| <<<< 0765 gtgcttgatgcttgttccttttgatcgtggtgatttagagggtgaactca 0716 0251 ctggaacgggggtgcttgcatgtgtaatcttactaagagctaa 0293 <<<< ||||||||||| ||||||||||||||||||||||||||||||| <<<< 0715 ctggaacggggatgcttgcatgtgtaatcttactaagagctaa 0673 ------------------------------------------------------------------- 0294 tggaaag 0300 <<<< ||||||| <<<< 0270 tggaaag 0264 -------------------------------------------------------------------
More on negative cases:
213 1 - 2 172,42, 82,257, 137,310, 298 1 - 2 37,262, 0,37, 3069,3107, 298 222 - 2 293,6, 1,294, 11425,11940, 298 4939 - 2 239,60, 1,240, 717,5895, 310 - 172 -137 = 1 --> [138, 309] ==> break starts at 310 (137 + 172 + 1 ) with 1 bp long deletion DelStart: 137 + 172 + 1 [310] DelEnd: 137 + 172 + 1 [310] 3107 - 37 -3069 = 1 --> [3070, 3106] ==> break starts at 3107 (3069 + 37 + 1) with 1 bp long deletion DelStart: 3069 + 37 + 1 [3107] DelEnd: 3069 + 37 + 1 [3107] 11940 - 293 - 11425 = 222 --> [11426, 11939] ==> braak starts at 11719 (11425 + 293 + 1) with 222 bps long deletion DelStart: 11425 + 293 + 1 [11719] DelEnd: 11425 + 293 + 222 [11940] 5895 - 239 - 717 = 4939 --> [718, 5894 ] ==> break starts at 957 (717 + 239 + 1) with 4939 bps long deletion DelStart: 717 + 239 + 1 [957] DelEnd: 717 + 239 + 4939 [5895]
Solution is: there was one-base off !!!
Solution is:
Sample read:
Hs_Mito_ref:
Case: 3-blocks positive strand
298 9962 + 3 69,205,24, 1,70,275, 4132,14162,14368,
Score E
Sequences producing significant alignments: (bits) Value
gi|251831106|ref|NC_012920.1|_ChrM 437 e-122
gi|251831106|ref|NC_012920.1|_ChrM 135 7e-32
>gi|251831106|ref|NC_012920.1|_ChrM
Length = 16569
Score = 437 bits (1127), Expect = e-122
Identities = 229/230 (100%)
Strand = Plus / Plus
Query: 71 caatctcaattacaatatatacaccaacaaacaatgttcaaccagtaactactactaatc 130
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 14163 caatctcaattacaatatatacaccaacaaacaatgttcaaccagtaactactactaatc 14222
Query: 131 aacgcccataatcatacaaagcccccgcaccaataggatcctcccgaatcaaccctgacc 190
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 14223 aacgcccataatcatacaaagcccccgcaccaataggatcctcccgaatcaaccctgacc 14282
Query: 191 cctctccttcataaattattcagcttcctacactattaaagtttaccacaaccaccaccc 250
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 14283 cctctccttcataaattattcagcttcctacactattaaagtttaccacaaccaccaccc 14342
Query: 251 catcatactctttcacccacagcac-aatcctacctccatcgctaacccc 299
||||||||||||||||||||||||| ||||||||||||||||||||||||
Sbjct: 14343 catcatactctttcacccacagcaccaatcctacctccatcgctaacccc 14392
Score = 135 bits (348), Expect = 7e-32
Identities = 69/69 (100%)
Strand = Plus / Plus
Query: 2 catacccccgattccgctacgaccaactcatacacctcctatgaaaaaacttcctaccac 61
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 4133 catacccccgattccgctacgaccaactcatacacctcctatgaaaaaacttcctaccac 4192
Query: 62 tcaccctag 70
|||||||||
Sbjct: 4193 tcaccctag 4201
Another 3 block examples with both breaks greater than 15
Score E
Sequences producing significant alignments: (bits) Value
gi|251831106|ref|NC_012920.1|_ChrM 472 e-133
gi|251831106|ref|NC_012920.1|_ChrM 92 7e-19
gi|251831106|ref|NC_012920.1|_ChrM 14 2e+05
>gi|251831106|ref|NC_012920.1|_ChrM
Length = 16569
Score = 472 bits (1219), Expect = e-133
Identities = 244/246 (99%)
Strand = Plus / Plus
Query: 48 caattctccgatccgtccctaacaaactaggaggcgtccttgccctattactatccatcc 107
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 15582 caattctccgatccgtccctaacaaactaggaggcgtccttgccctattactatccatcc 15641
Query: 108 tcatcctagcaataatccccatcctccatatatccaaacaacaaagcataatacttcgcc 167
||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||
Sbjct: 15642 tcatcctagcaataatccccatcctccatatatccaaacaacaaagcataatatttcgcc 15701
Query: 168 cactaagccaatcactttattgactcctagccgcagacctcctcattctaacctgaatcg 227
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 15702 cactaagccaatcactttattgactcctagccgcagacctcctcattctaacctgaatcg 15761
Query: 228 gaggacaaccagtaagctacccttttaccatcattggaccagtagcatccgtactatact 287
||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||
Sbjct: 15762 gaggacaaccagtaagctacccttttaccatcattggacaagtagcatccgtactatact 15821
Query: 288 tcacaa 293
||||||
Sbjct: 15822 tcacaa 15827
Score = 92 bits (237), Expect = 7e-19
Identities = 47/47 (100%)
Strand = Plus / Plus
Query: 1 catcccgatggtgcagccgctattaaaggttcgtttgttcaacgatt 47
|||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 3004 catcccgatggtgcagccgctattaaaggttcgtttgttcaacgatt 3050
Score = 14 bits (36), Expect = 2e+05
Identities = 7/7 (100%)
Strand = Plus / Plus
Query: 294 cctgtcc 300
|||||||
Sbjct: 15885 cctgtcc 15891
Case: 3-blocks negative strand
296 10788 - 3 113,182,5, 0,113,295, 5320,16167,16403,
Sequences producing significant alignments: (bits) Value
gi|251831106|ref|NC_012920.1|_ChrM 347 1e-95
gi|251831106|ref|NC_012920.1|_ChrM 216 2e-56
gi|251831106|ref|NC_012920.1|_ChrM 10 3e+06
>gi|251831106|ref|NC_012920.1|_ChrM
Length = 16569
Score = 347 bits (895), Expect = 1e-95
Identities = 179/182 (98%)
Strand = Minus / Plus
Query: 187 ccaatccacatcaaaaccccctcctcatgcttacaagcaagtacagcaatcaaccctcaa 128
|||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||
Sbjct: 16168 ccaatccacatcaaaaccccctccccatgcttacaagcaagtacagcaatcaaccctcaa 16227
Query: 127 ctatcacacatcaactgcaactccaaagtcacccctcacccattaggataccaacaaacc 68
|||||||||||||||||||||||||||| ||||||||||||| |||||||||||||||||
Sbjct: 16228 ctatcacacatcaactgcaactccaaagccacccctcacccactaggataccaacaaacc 16287
Query: 67 tacccacccttaacagtacatagtacataaagccatttaccgtacatagcacattacagt 8
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 16288 tacccacccttaacagtacatagtacataaagccatttaccgtacatagcacattacagt 16347
Query: 7 ca 6
||
Sbjct: 16348 ca 16349
Score = 216 bits (558), Expect = 2e-56
Identities = 112/113 (99%)
Strand = Minus / Plus
Query: 300 catcaccctccttaacctttacttctacctacgcctaatctactccacctcaatcacact 241
|||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||
Sbjct: 5321 catcaccctccttaacctctacttctacctacgcctaatctactccacctcaatcacact 5380
Query: 240 actccccatatctaacaacgtaaaaataaaatgacagtttgaacatacaaaac 188
|||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 5381 actccccatatctaacaacgtaaaaataaaatgacagtttgaacatacaaaac 5433
Score = 10 bits (25), Expect = 3e+06
Identities = 5/5 (100%)
Strand = Minus / Plus
Query: 5 catcc 1
|||||
Sbjct: 16404 catcc 16408
Case: 3-blocks negative strand — bad example
275 13167 - 3 82,132,62, 24,106,238, 3024,3107,16405,
Output:
BLASTN 2.2.11 [blat]
Reference: Kent, WJ. (2002) BLAT - The BLAST-like alignment tool
Query= @HWI-M01825:53:000000000-A8MJM:1:1101:23159:10038
(300 letters)
Database: ../../../reference/hs_mit_seq.fasta
1 sequences; 16,569 total letters
Searching.done
Score E
Sequences producing significant alignments: (bits) Value
gi|251831106|ref|NC_012920.1|_ChrM 405 e-113
gi|251831106|ref|NC_012920.1|_ChrM 122 6e-28
>gi|251831106|ref|NC_012920.1|_ChrM
Length = 16569
Score = 405 bits (1044), Expect = e-113
Identities = 213/215 (99%)
Strand = Minus / Plus
Query: 276 attaaaggttcgtttgttcaacgattaaagtcctacgtgatctgagttcagaccggagta 217
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 3025 attaaaggttcgtttgttcaacgattaaagtcctacgtgatctgagttcagaccggagta 3084
Query: 216 atccaggtcggtttctatctac-ttcaaattcctccctgtacgaaaggacaagagaaata 158
|||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||
Sbjct: 3085 atccaggtcggtttctatctacnttcaaattcctccctgtacgaaaggacaagagaaata 3144
Query: 157 aggcctacttcacaaagcgccttcccccgtaaatgatatcatctcaacttagcattatac 98
|||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||
Sbjct: 3145 aggcctacttcacaaagcgccttcccccgtaaatgatatcatctcaacttagtattatac 3204
Query: 97 ccacacccacccaagaacagggtttgttaagatgg 63
|||||||||||||||||||||||||||||||||||
Sbjct: 3205 ccacacccacccaagaacagggtttgttaagatgg 3239
Score = 122 bits (315), Expect = 6e-28
Identities = 62/62 (100%)
Strand = Minus / Plus
Query: 62 tcctccgtgaaatcaatatcccgcacaagagtgctactctcctcgctccgggcccataac 3
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 16406 tcctccgtgaaatcaatatcccgcacaagagtgctactctcctcgctccgggcccataac 16465
Query: 2 ac 1
||
Sbjct: 16466 ac 16467