Learning CPAN

When I tried to install GD module, I had error saying lsdgp 2.0.28 or higher is required.

Frank helped me with a command “yum install GD” 🙂

To get some modules work, sometimes one has to take care of the dependencies. i.e. Font::TTF

Circos: an information aesthetic for comparative genomics

At Steve’s lab meeting, I came across with this paper, published on Genome Research. It seems to be a good piece of software with full length documentation. Circos has been used in many scientific publications. However, to make it work, it is not trivial.

I am trying to document what I have and hopefully, I can make it working in my hand. Fingers crossed 🙂

Since I am the root on SeqBig, I am testing the installation on this machine. Really, the goal would be if I could create a virtualmachine so that I can give it to whoever wants to use it.

Step 1. Get all the CPAN modules installed. Not trivial though..

Honestly, it did not work out as well as I expected. Therefore, ONLY way I made it work is using my virtual machine.

Here is another related page that has more information.

In fact, there are many groups who are consistently searching for tools for draw a circular genome results like circos, i.e. a paper published on Bioinformatics call circleator does similar plots also.

MARCO transcription factor study

Okay, I am trying to help a researcher for her project involving the MARCO genes. She was interested in the transcription binding site upstream of this gene.

I asked my college and got his help with ARE domain analysis. And, I also found this post done by Dave Tang ??

It is very helpful document.

To plot a sequence logo, researchers in the this community tend to use weblogo to plot

Understand the blat and blat output

Here is the deal, I need to parse the “blat” output with two, three, and even four blocks to figure out where the deletion is exactly located:

Sample file format
match    T gap   strand          block   blockSizes      qStarts          tStarts
         bases                   count
298      9962    +       3       69,205,24,      1,70,275,       4132,14162,14368,
299      63      -       3       5,107,188,      0,5,112,        2932,2999,3107,

298      1       +       2       136,163,        1,137,          2970,3107,
213      1       -       2       172,42,         82,257,         137,310,


4-blocks are yet to be implemented in the future

For 2-block positive strand case

298      1       +       2       136,163,        1,137,          2970,3107,
298      1       +       2       147,152,        1,148,          2299,2447,
299      96      +       2       293,6,          1,294,          14350,14739,

Solution is:

3107 - 2970 - 136  = 1   --> [3107,  3107] 

DelStart =  2970 + 136 + 1 [3107]
DelEnd:  =  2970 + 136 + 1 [3107]
# Match stops at 3106 (2970 + 136), with 3107 missing, another match starts at 3108


2447 - 2299 - 147  = 1   --> [2447,  2447] 
DelStart =  2299 + 147 + 1 [2447]
DelEnd:  =  2299 + 147 + 1 [2447]

# Match stops at 2446 (2299 + 147), with 2447 missing, another match starts at 2448

14739 - 14350 - 293  = 96   --> [14644,  14739] 
DelStart =  14350 + 293 + 1   [14644]
DelEnd:  =  14350 + 293 + 96  [14739]

# Match stops at 14643 (14350 + 293 ), with [14644-14739] missing, another match starts at 14740

For 2-block negative strand case, same as POSITIVE strand !!!

298      402     -      2       7,293,          0,7,    263,672,
#Query:@HWI-M01825:53:000000000-A8MJM:1:1101:16256:1461      

672 - 7 - 263 = 402  --> [264, 270]  [271, 672] 

# Match starts at 264 and stops at 270 with 7 bps aligned, breaks [271, 672] , 
match start at 673
# As a result ==> break starts at 271 (= 263 + 7 + 1), with 402 long deletion
Therefore:
DelStart: 263 + 7 + 1 [271]
DelEnd:   263 + 7 +  402  [672] 

BLAT results details as following:
----------------------------------------------------------------

Side by Side Alignment


0001 ggggagggggtgatctaaaacactctttacgccggcttctattgacttgg 0050
<<<< |||||||||||||||||||||||||||||||||||||||||||||||||| <<<<
0965 ggggagggggtgatctaaaacactctttacgccggcttctattgacttgg 0916

0051 gttaatcgtgtgaccgcggtggctggcacgaaattgaccaaccctggggt 0100
<<<< |||||||||||||||||||||||||||||||||||||||||||||||||| <<<<
0915 gttaatcgtgtgaccgcggtggctggcacgaaattgaccaaccctggggt 0866

0101 tagtatagcttagttaaactttcgtttattgctaaaggttaatcactgct 0150
<<<< |||||||||||||||||||||||||||||||||||||||||||||||||| <<<<
0865 tagtatagcttagttaaactttcgtttattgctaaaggttaatcactgct 0816

0151 gtttcccgtgggggtgtggctaggctaagcgttttgagctgcattgctgc 0200
<<<< |||||||||||||||||||||||||||||||||||||||||||||||||| <<<<
0815 gtttcccgtgggggtgtggctaggctaagcgttttgagctgcattgctgc 0766

0201 gtgcttgatgcttgtcccttttgatcgtggtgatttagagggtgaactca 0250
<<<< ||||||||||||||| |||||||||||||||||||||||||||||||||| <<<<
0765 gtgcttgatgcttgttccttttgatcgtggtgatttagagggtgaactca 0716

0251 ctggaacgggggtgcttgcatgtgtaatcttactaagagctaa 0293
<<<< ||||||||||| ||||||||||||||||||||||||||||||| <<<<
0715 ctggaacggggatgcttgcatgtgtaatcttactaagagctaa 0673

-------------------------------------------------------------------
0294 tggaaag 0300
<<<< ||||||| <<<<
0270 tggaaag 0264

-------------------------------------------------------------------

More on negative cases:

213      1       -       2       172,42,         82,257,         137,310,
298      1       -       2       37,262,         0,37,   3069,3107,
298      222     -       2       293,6,          1,294,          11425,11940,
298      4939    -       2       239,60,         1,240,          717,5895,


310   -  172 -137     =  1   -->  [138, 309]     
==> break starts at 310   (137   + 172 + 1 ) with 1 bp long deletion
DelStart: 137   + 172 + 1 [310]
DelEnd:   137   + 172 + 1 [310]

3107  - 37  -3069     =  1   -->  [3070, 3106]   
==> break starts at 3107  (3069  + 37  + 1) with 1 bp long deletion
DelStart: 3069 + 37  + 1 [3107]
DelEnd:   3069 + 37  + 1 [3107]

11940 - 293 - 11425   = 222  -->  [11426, 11939] 
==> braak starts at 11719 (11425 + 293 + 1) with 222 bps long deletion
DelStart: 11425 + 293 + 1    [11719]
DelEnd:   11425 + 293  + 222 [11940]

5895  - 239 - 717     = 4939 -->  [718,   5894 ] 
==> break starts at 957   (717   + 239 + 1) with 4939 bps long deletion
DelStart: 717 + 239 + 1    [957]
DelEnd:   717 + 239  + 4939 [5895]


Solution is: there was one-base off !!!



Solution is:

Sample read:

Hs_Mito_ref:
Case: 3-blocks positive strand

298      9962    +       3       69,205,24,      1,70,275,       4132,14162,14368,

                                                                 Score    E
Sequences producing significant alignments:                      (bits) Value

gi|251831106|ref|NC_012920.1|_ChrM                                    437   e-122
gi|251831106|ref|NC_012920.1|_ChrM                                    135   7e-32



>gi|251831106|ref|NC_012920.1|_ChrM 
          Length = 16569

 Score = 437 bits (1127), Expect = e-122
 Identities = 229/230 (100%)
 Strand = Plus / Plus

Query: 71    caatctcaattacaatatatacaccaacaaacaatgttcaaccagtaactactactaatc 130
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 14163 caatctcaattacaatatatacaccaacaaacaatgttcaaccagtaactactactaatc 14222

Query: 131   aacgcccataatcatacaaagcccccgcaccaataggatcctcccgaatcaaccctgacc 190
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 14223 aacgcccataatcatacaaagcccccgcaccaataggatcctcccgaatcaaccctgacc 14282

Query: 191   cctctccttcataaattattcagcttcctacactattaaagtttaccacaaccaccaccc 250
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 14283 cctctccttcataaattattcagcttcctacactattaaagtttaccacaaccaccaccc 14342

Query: 251   catcatactctttcacccacagcac-aatcctacctccatcgctaacccc 299
             ||||||||||||||||||||||||| ||||||||||||||||||||||||
Sbjct: 14343 catcatactctttcacccacagcaccaatcctacctccatcgctaacccc 14392


 Score = 135 bits (348), Expect = 7e-32
 Identities = 69/69 (100%)
 Strand = Plus / Plus

Query: 2    catacccccgattccgctacgaccaactcatacacctcctatgaaaaaacttcctaccac 61
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 4133 catacccccgattccgctacgaccaactcatacacctcctatgaaaaaacttcctaccac 4192

Query: 62   tcaccctag 70
            |||||||||
Sbjct: 4193 tcaccctag 4201

Another 3 block examples with both breaks greater than 15

                                                                 Score    E
Sequences producing significant alignments:                      (bits) Value

gi|251831106|ref|NC_012920.1|_ChrM                                    472   e-133
gi|251831106|ref|NC_012920.1|_ChrM                                     92   7e-19
gi|251831106|ref|NC_012920.1|_ChrM                                     14   2e+05



>gi|251831106|ref|NC_012920.1|_ChrM 
          Length = 16569

 Score = 472 bits (1219), Expect = e-133
 Identities = 244/246 (99%)
 Strand = Plus / Plus

Query: 48    caattctccgatccgtccctaacaaactaggaggcgtccttgccctattactatccatcc 107
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 15582 caattctccgatccgtccctaacaaactaggaggcgtccttgccctattactatccatcc 15641

Query: 108   tcatcctagcaataatccccatcctccatatatccaaacaacaaagcataatacttcgcc 167
             ||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||
Sbjct: 15642 tcatcctagcaataatccccatcctccatatatccaaacaacaaagcataatatttcgcc 15701

Query: 168   cactaagccaatcactttattgactcctagccgcagacctcctcattctaacctgaatcg 227
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 15702 cactaagccaatcactttattgactcctagccgcagacctcctcattctaacctgaatcg 15761

Query: 228   gaggacaaccagtaagctacccttttaccatcattggaccagtagcatccgtactatact 287
             ||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||
Sbjct: 15762 gaggacaaccagtaagctacccttttaccatcattggacaagtagcatccgtactatact 15821

Query: 288   tcacaa 293
             ||||||
Sbjct: 15822 tcacaa 15827


 Score = 92 bits (237), Expect = 7e-19
 Identities = 47/47 (100%)
 Strand = Plus / Plus

Query: 1    catcccgatggtgcagccgctattaaaggttcgtttgttcaacgatt 47
            |||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 3004 catcccgatggtgcagccgctattaaaggttcgtttgttcaacgatt 3050


 Score = 14 bits (36), Expect = 2e+05
 Identities = 7/7 (100%)
 Strand = Plus / Plus

Query: 294   cctgtcc 300
             |||||||
Sbjct: 15885 cctgtcc 15891

Case: 3-blocks negative strand


296      10788   -       3       113,182,5,      0,113,295, 5320,16167,16403,


Sequences producing significant alignments:                      (bits) Value

gi|251831106|ref|NC_012920.1|_ChrM                                    347   1e-95
gi|251831106|ref|NC_012920.1|_ChrM                                    216   2e-56
gi|251831106|ref|NC_012920.1|_ChrM                                     10   3e+06



>gi|251831106|ref|NC_012920.1|_ChrM 
          Length = 16569

 Score = 347 bits (895), Expect = 1e-95
 Identities = 179/182 (98%)
 Strand = Minus / Plus

Query: 187   ccaatccacatcaaaaccccctcctcatgcttacaagcaagtacagcaatcaaccctcaa 128
             |||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||
Sbjct: 16168 ccaatccacatcaaaaccccctccccatgcttacaagcaagtacagcaatcaaccctcaa 16227

Query: 127   ctatcacacatcaactgcaactccaaagtcacccctcacccattaggataccaacaaacc 68
             |||||||||||||||||||||||||||| ||||||||||||| |||||||||||||||||
Sbjct: 16228 ctatcacacatcaactgcaactccaaagccacccctcacccactaggataccaacaaacc 16287

Query: 67    tacccacccttaacagtacatagtacataaagccatttaccgtacatagcacattacagt 8
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 16288 tacccacccttaacagtacatagtacataaagccatttaccgtacatagcacattacagt 16347

Query: 7     ca 6
             ||
Sbjct: 16348 ca 16349


 Score = 216 bits (558), Expect = 2e-56
 Identities = 112/113 (99%)
 Strand = Minus / Plus

Query: 300  catcaccctccttaacctttacttctacctacgcctaatctactccacctcaatcacact 241
            |||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||
Sbjct: 5321 catcaccctccttaacctctacttctacctacgcctaatctactccacctcaatcacact 5380

Query: 240  actccccatatctaacaacgtaaaaataaaatgacagtttgaacatacaaaac 188
            |||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 5381 actccccatatctaacaacgtaaaaataaaatgacagtttgaacatacaaaac 5433


 Score = 10 bits (25), Expect = 3e+06
 Identities = 5/5 (100%)
 Strand = Minus / Plus

Query: 5     catcc 1
             |||||
Sbjct: 16404 catcc 16408

Case: 3-blocks negative strand — bad example

275      13167   -   3       82,132,62,      24,106,238,     3024,3107,16405,

Output:

BLASTN 2.2.11 [blat]

Reference:  Kent, WJ. (2002) BLAT - The BLAST-like alignment tool

Query= @HWI-M01825:53:000000000-A8MJM:1:1101:23159:10038
         (300 letters)

Database: ../../../reference/hs_mit_seq.fasta 
           1 sequences; 16,569 total letters

Searching.done
                                                                 Score    E
Sequences producing significant alignments:                      (bits) Value

gi|251831106|ref|NC_012920.1|_ChrM                                    405   e-113
gi|251831106|ref|NC_012920.1|_ChrM                                    122   6e-28



>gi|251831106|ref|NC_012920.1|_ChrM 
          Length = 16569

 Score = 405 bits (1044), Expect = e-113
 Identities = 213/215 (99%)
 Strand = Minus / Plus

Query: 276  attaaaggttcgtttgttcaacgattaaagtcctacgtgatctgagttcagaccggagta 217
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 3025 attaaaggttcgtttgttcaacgattaaagtcctacgtgatctgagttcagaccggagta 3084

Query: 216  atccaggtcggtttctatctac-ttcaaattcctccctgtacgaaaggacaagagaaata 158
            |||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||
Sbjct: 3085 atccaggtcggtttctatctacnttcaaattcctccctgtacgaaaggacaagagaaata 3144

Query: 157  aggcctacttcacaaagcgccttcccccgtaaatgatatcatctcaacttagcattatac 98
            |||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||
Sbjct: 3145 aggcctacttcacaaagcgccttcccccgtaaatgatatcatctcaacttagtattatac 3204

Query: 97   ccacacccacccaagaacagggtttgttaagatgg 63
            |||||||||||||||||||||||||||||||||||
Sbjct: 3205 ccacacccacccaagaacagggtttgttaagatgg 3239


 Score = 122 bits (315), Expect = 6e-28
 Identities = 62/62 (100%)
 Strand = Minus / Plus

Query: 62    tcctccgtgaaatcaatatcccgcacaagagtgctactctcctcgctccgggcccataac 3
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 16406 tcctccgtgaaatcaatatcccgcacaagagtgctactctcctcgctccgggcccataac 16465

Query: 2     ac 1
             ||
Sbjct: 16466 ac 16467