Here is the deal, I need to parse the “blat” output with two, three, and even four blocks to figure out where the deletion is exactly located:
Sample file format match T gap strand block blockSizes qStarts tStarts bases count 298 9962 + 3 69,205,24, 1,70,275, 4132,14162,14368, 299 63 - 3 5,107,188, 0,5,112, 2932,2999,3107, 298 1 + 2 136,163, 1,137, 2970,3107, 213 1 - 2 172,42, 82,257, 137,310, 4-blocks are yet to be implemented in the future
For 2-block positive strand case
298 1 + 2 136,163, 1,137, 2970,3107, 298 1 + 2 147,152, 1,148, 2299,2447, 299 96 + 2 293,6, 1,294, 14350,14739,
Solution is:
3107 - 2970 - 136 = 1 --> [3107, 3107] DelStart = 2970 + 136 + 1 [3107] DelEnd: = 2970 + 136 + 1 [3107] # Match stops at 3106 (2970 + 136), with 3107 missing, another match starts at 3108 2447 - 2299 - 147 = 1 --> [2447, 2447] DelStart = 2299 + 147 + 1 [2447] DelEnd: = 2299 + 147 + 1 [2447] # Match stops at 2446 (2299 + 147), with 2447 missing, another match starts at 2448 14739 - 14350 - 293 = 96 --> [14644, 14739] DelStart = 14350 + 293 + 1 [14644] DelEnd: = 14350 + 293 + 96 [14739] # Match stops at 14643 (14350 + 293 ), with [14644-14739] missing, another match starts at 14740
For 2-block negative strand case, same as POSITIVE strand !!!
298 402 - 2 7,293, 0,7, 263,672, #Query:@HWI-M01825:53:000000000-A8MJM:1:1101:16256:1461 672 - 7 - 263 = 402 --> [264, 270] [271, 672] # Match starts at 264 and stops at 270 with 7 bps aligned, breaks [271, 672] , match start at 673 # As a result ==> break starts at 271 (= 263 + 7 + 1), with 402 long deletion Therefore: DelStart: 263 + 7 + 1 [271] DelEnd: 263 + 7 + 402 [672] BLAT results details as following: ---------------------------------------------------------------- Side by Side Alignment 0001 ggggagggggtgatctaaaacactctttacgccggcttctattgacttgg 0050 <<<< |||||||||||||||||||||||||||||||||||||||||||||||||| <<<< 0965 ggggagggggtgatctaaaacactctttacgccggcttctattgacttgg 0916 0051 gttaatcgtgtgaccgcggtggctggcacgaaattgaccaaccctggggt 0100 <<<< |||||||||||||||||||||||||||||||||||||||||||||||||| <<<< 0915 gttaatcgtgtgaccgcggtggctggcacgaaattgaccaaccctggggt 0866 0101 tagtatagcttagttaaactttcgtttattgctaaaggttaatcactgct 0150 <<<< |||||||||||||||||||||||||||||||||||||||||||||||||| <<<< 0865 tagtatagcttagttaaactttcgtttattgctaaaggttaatcactgct 0816 0151 gtttcccgtgggggtgtggctaggctaagcgttttgagctgcattgctgc 0200 <<<< |||||||||||||||||||||||||||||||||||||||||||||||||| <<<< 0815 gtttcccgtgggggtgtggctaggctaagcgttttgagctgcattgctgc 0766 0201 gtgcttgatgcttgtcccttttgatcgtggtgatttagagggtgaactca 0250 <<<< ||||||||||||||| |||||||||||||||||||||||||||||||||| <<<< 0765 gtgcttgatgcttgttccttttgatcgtggtgatttagagggtgaactca 0716 0251 ctggaacgggggtgcttgcatgtgtaatcttactaagagctaa 0293 <<<< ||||||||||| ||||||||||||||||||||||||||||||| <<<< 0715 ctggaacggggatgcttgcatgtgtaatcttactaagagctaa 0673 ------------------------------------------------------------------- 0294 tggaaag 0300 <<<< ||||||| <<<< 0270 tggaaag 0264 -------------------------------------------------------------------
More on negative cases:
213 1 - 2 172,42, 82,257, 137,310, 298 1 - 2 37,262, 0,37, 3069,3107, 298 222 - 2 293,6, 1,294, 11425,11940, 298 4939 - 2 239,60, 1,240, 717,5895, 310 - 172 -137 = 1 --> [138, 309] ==> break starts at 310 (137 + 172 + 1 ) with 1 bp long deletion DelStart: 137 + 172 + 1 [310] DelEnd: 137 + 172 + 1 [310] 3107 - 37 -3069 = 1 --> [3070, 3106] ==> break starts at 3107 (3069 + 37 + 1) with 1 bp long deletion DelStart: 3069 + 37 + 1 [3107] DelEnd: 3069 + 37 + 1 [3107] 11940 - 293 - 11425 = 222 --> [11426, 11939] ==> braak starts at 11719 (11425 + 293 + 1) with 222 bps long deletion DelStart: 11425 + 293 + 1 [11719] DelEnd: 11425 + 293 + 222 [11940] 5895 - 239 - 717 = 4939 --> [718, 5894 ] ==> break starts at 957 (717 + 239 + 1) with 4939 bps long deletion DelStart: 717 + 239 + 1 [957] DelEnd: 717 + 239 + 4939 [5895]
Solution is: there was one-base off !!!
Solution is:
Sample read:
Hs_Mito_ref:
Case: 3-blocks positive strand
298 9962 + 3 69,205,24, 1,70,275, 4132,14162,14368, Score E Sequences producing significant alignments: (bits) Value gi|251831106|ref|NC_012920.1|_ChrM 437 e-122 gi|251831106|ref|NC_012920.1|_ChrM 135 7e-32 >gi|251831106|ref|NC_012920.1|_ChrM Length = 16569 Score = 437 bits (1127), Expect = e-122 Identities = 229/230 (100%) Strand = Plus / Plus Query: 71 caatctcaattacaatatatacaccaacaaacaatgttcaaccagtaactactactaatc 130 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 14163 caatctcaattacaatatatacaccaacaaacaatgttcaaccagtaactactactaatc 14222 Query: 131 aacgcccataatcatacaaagcccccgcaccaataggatcctcccgaatcaaccctgacc 190 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 14223 aacgcccataatcatacaaagcccccgcaccaataggatcctcccgaatcaaccctgacc 14282 Query: 191 cctctccttcataaattattcagcttcctacactattaaagtttaccacaaccaccaccc 250 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 14283 cctctccttcataaattattcagcttcctacactattaaagtttaccacaaccaccaccc 14342 Query: 251 catcatactctttcacccacagcac-aatcctacctccatcgctaacccc 299 ||||||||||||||||||||||||| |||||||||||||||||||||||| Sbjct: 14343 catcatactctttcacccacagcaccaatcctacctccatcgctaacccc 14392 Score = 135 bits (348), Expect = 7e-32 Identities = 69/69 (100%) Strand = Plus / Plus Query: 2 catacccccgattccgctacgaccaactcatacacctcctatgaaaaaacttcctaccac 61 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 4133 catacccccgattccgctacgaccaactcatacacctcctatgaaaaaacttcctaccac 4192 Query: 62 tcaccctag 70 ||||||||| Sbjct: 4193 tcaccctag 4201
Another 3 block examples with both breaks greater than 15
Score E Sequences producing significant alignments: (bits) Value gi|251831106|ref|NC_012920.1|_ChrM 472 e-133 gi|251831106|ref|NC_012920.1|_ChrM 92 7e-19 gi|251831106|ref|NC_012920.1|_ChrM 14 2e+05 >gi|251831106|ref|NC_012920.1|_ChrM Length = 16569 Score = 472 bits (1219), Expect = e-133 Identities = 244/246 (99%) Strand = Plus / Plus Query: 48 caattctccgatccgtccctaacaaactaggaggcgtccttgccctattactatccatcc 107 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 15582 caattctccgatccgtccctaacaaactaggaggcgtccttgccctattactatccatcc 15641 Query: 108 tcatcctagcaataatccccatcctccatatatccaaacaacaaagcataatacttcgcc 167 ||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||| Sbjct: 15642 tcatcctagcaataatccccatcctccatatatccaaacaacaaagcataatatttcgcc 15701 Query: 168 cactaagccaatcactttattgactcctagccgcagacctcctcattctaacctgaatcg 227 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 15702 cactaagccaatcactttattgactcctagccgcagacctcctcattctaacctgaatcg 15761 Query: 228 gaggacaaccagtaagctacccttttaccatcattggaccagtagcatccgtactatact 287 ||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||| Sbjct: 15762 gaggacaaccagtaagctacccttttaccatcattggacaagtagcatccgtactatact 15821 Query: 288 tcacaa 293 |||||| Sbjct: 15822 tcacaa 15827 Score = 92 bits (237), Expect = 7e-19 Identities = 47/47 (100%) Strand = Plus / Plus Query: 1 catcccgatggtgcagccgctattaaaggttcgtttgttcaacgatt 47 ||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 3004 catcccgatggtgcagccgctattaaaggttcgtttgttcaacgatt 3050 Score = 14 bits (36), Expect = 2e+05 Identities = 7/7 (100%) Strand = Plus / Plus Query: 294 cctgtcc 300 ||||||| Sbjct: 15885 cctgtcc 15891
Case: 3-blocks negative strand
296 10788 - 3 113,182,5, 0,113,295, 5320,16167,16403, Sequences producing significant alignments: (bits) Value gi|251831106|ref|NC_012920.1|_ChrM 347 1e-95 gi|251831106|ref|NC_012920.1|_ChrM 216 2e-56 gi|251831106|ref|NC_012920.1|_ChrM 10 3e+06 >gi|251831106|ref|NC_012920.1|_ChrM Length = 16569 Score = 347 bits (895), Expect = 1e-95 Identities = 179/182 (98%) Strand = Minus / Plus Query: 187 ccaatccacatcaaaaccccctcctcatgcttacaagcaagtacagcaatcaaccctcaa 128 |||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||| Sbjct: 16168 ccaatccacatcaaaaccccctccccatgcttacaagcaagtacagcaatcaaccctcaa 16227 Query: 127 ctatcacacatcaactgcaactccaaagtcacccctcacccattaggataccaacaaacc 68 |||||||||||||||||||||||||||| ||||||||||||| ||||||||||||||||| Sbjct: 16228 ctatcacacatcaactgcaactccaaagccacccctcacccactaggataccaacaaacc 16287 Query: 67 tacccacccttaacagtacatagtacataaagccatttaccgtacatagcacattacagt 8 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 16288 tacccacccttaacagtacatagtacataaagccatttaccgtacatagcacattacagt 16347 Query: 7 ca 6 || Sbjct: 16348 ca 16349 Score = 216 bits (558), Expect = 2e-56 Identities = 112/113 (99%) Strand = Minus / Plus Query: 300 catcaccctccttaacctttacttctacctacgcctaatctactccacctcaatcacact 241 |||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||| Sbjct: 5321 catcaccctccttaacctctacttctacctacgcctaatctactccacctcaatcacact 5380 Query: 240 actccccatatctaacaacgtaaaaataaaatgacagtttgaacatacaaaac 188 ||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 5381 actccccatatctaacaacgtaaaaataaaatgacagtttgaacatacaaaac 5433 Score = 10 bits (25), Expect = 3e+06 Identities = 5/5 (100%) Strand = Minus / Plus Query: 5 catcc 1 ||||| Sbjct: 16404 catcc 16408
Case: 3-blocks negative strand — bad example
275 13167 - 3 82,132,62, 24,106,238, 3024,3107,16405, Output: BLASTN 2.2.11 [blat] Reference: Kent, WJ. (2002) BLAT - The BLAST-like alignment tool Query= @HWI-M01825:53:000000000-A8MJM:1:1101:23159:10038 (300 letters) Database: ../../../reference/hs_mit_seq.fasta 1 sequences; 16,569 total letters Searching.done Score E Sequences producing significant alignments: (bits) Value gi|251831106|ref|NC_012920.1|_ChrM 405 e-113 gi|251831106|ref|NC_012920.1|_ChrM 122 6e-28 >gi|251831106|ref|NC_012920.1|_ChrM Length = 16569 Score = 405 bits (1044), Expect = e-113 Identities = 213/215 (99%) Strand = Minus / Plus Query: 276 attaaaggttcgtttgttcaacgattaaagtcctacgtgatctgagttcagaccggagta 217 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 3025 attaaaggttcgtttgttcaacgattaaagtcctacgtgatctgagttcagaccggagta 3084 Query: 216 atccaggtcggtttctatctac-ttcaaattcctccctgtacgaaaggacaagagaaata 158 |||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||| Sbjct: 3085 atccaggtcggtttctatctacnttcaaattcctccctgtacgaaaggacaagagaaata 3144 Query: 157 aggcctacttcacaaagcgccttcccccgtaaatgatatcatctcaacttagcattatac 98 |||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||| Sbjct: 3145 aggcctacttcacaaagcgccttcccccgtaaatgatatcatctcaacttagtattatac 3204 Query: 97 ccacacccacccaagaacagggtttgttaagatgg 63 ||||||||||||||||||||||||||||||||||| Sbjct: 3205 ccacacccacccaagaacagggtttgttaagatgg 3239 Score = 122 bits (315), Expect = 6e-28 Identities = 62/62 (100%) Strand = Minus / Plus Query: 62 tcctccgtgaaatcaatatcccgcacaagagtgctactctcctcgctccgggcccataac 3 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 16406 tcctccgtgaaatcaatatcccgcacaagagtgctactctcctcgctccgggcccataac 16465 Query: 2 ac 1 || Sbjct: 16466 ac 16467