Today, I registered with mitomap and started to learn how to use it.
It is quite interesting that deletion often occurs at the direct repeat, i.e.
DelJunction DelSize RepeatLocation RepeatType 10058:14593 -4534 10059-10063/14593-14597 D, 5/5 10169:14435 -4265 10161-10165/14424-14428 D, 5/5
In the first case, the deletion starts at 10058, right before the first repeat (10059-10063); and ends at 14593, right before the second “direct repeat” starts. As a result, one repeat (10059-10063) disappears.
In the second case, the deletion starts at 10169, after the first repeat (10161-10165); and ends at 14435, which contains that the second “direct repeat”. As a result, one repeat (14424-14428) disappears.
So, to better understand the scenario helps to implement the analytical processes.
There is a Perl module developed by the team called MitoMaster. It turns out to be a good tool
Now, with our mitochondrial project, we can search for possible “repeats” that flank a deletion.
For a given deletion junction: 10058:14593
- Case I: search up to 15 bp after break starts (10059 - 10074), against up to 15 bps right after the break ends: 14593
- Case II: search up to 15 bp before break starts (10043- 10058), against up to 15 bps before after the break ends: 14578 - 14593
- For each case, "number of repeat base" = min (15 bp, lengthOfDel)
I got help from tcoffee for local alignment Perl scripts.
While working on this project, I had experienced MUCH hassle with blat and blat output, especially to determine the “deletion” start and stop. It is also has something to do with the “zero” base coordinates or “one” base coordinates across different systems. Therefore, I created a separate post, hopefully could help me to get the bottom of this issues.
I found a very interesting alignment with G451E
In this directory: /ddn/gs1/home/li11/project2014/Copeland/IlluminaData/blatApp/gapHeatMap/
Issues this command: awk -F”\t” ‘{ if ($18 ==3 && $8 == 16054 ) print $1, “\t”, $8 ,”\t”, $9, “\t”, $10, “\t”, $18 , “\t”, $19, “\t”,$20 , “\t”, $21 }’ G451E_R1.psl
I found:
108 16054 – @HWI-M01825:53:000000000-A8MJM:1:1101:17663:13886 3 50,4,55, 191,241,245, 148,200,16256,
295 16054 – @HWI-M01825:53:000000000-A8MJM:1:1117:6130:11630 3 69,4,227, 0,69,73, 129,200,16256,
295 16054 + @HWI-M01825:53:000000000-A8MJM:1:2117:21755:9298 3 178,4,118, 0,178,182, 20,200,16256,
Based on my deletion detection rule:
diff1 = 200 - (148 + 4) = 48 diff1 = 200 - (129 + 4) = 67 diff1 = 200 - (20 + 4) = 176 diff2 = 16256 - (200 + 50) = 16006 diff2 = 16256 - (200 + 69) = 15987 diff2 = 16256 - (200 + 178) = 15878 totalDelLength = 16006 + 48 = 16054 totalDelLength = 15987 + 67 = 16054 totalDelLength = 15878 + 176 = 16054 #First one: if (diff1 > 15) ==> deletion is [153, 200] DelStart = (148 + 4) + 1 DelEnd = 200 if (diff2 > 15) ==> deletion is [16055, 16256] DelStart = 16006 + 48 + 1 DelEnd = 16256 #Second one: if (diff1 > 15) ==> deletion is [134, 200] DelStart = (129 + 4) + 1 DelEnd = 200 if (diff2 > 15) ==> deletion is [16055, 16256] DelStart = 15987 + 67 + 1 DelEnd = 16256 #Third case: if (diff1 > 15) ==> deletion is [25, 200] DelStart = (20 + 4) + 1 DelEnd = 200 if (diff2 > 15) ==> deletion is [16055, 16256] DelStart = 15878 + 176 + 1 DelEnd = 16256