Allele specific alignment pipeline (ASAP)

ASAP is a software from Babraham Bioinformatics Group that allows aligning NGS short reads to both parental genomes, a good reference can be found here.

The software can be downloaded here

A few things to keep in mind:

1. It only supports bowtie1, at least for now. Bowtie indexes for C3H and BL6 are located:
/ddn/gs1/project/mousemeth/reference/C3H_NISC_Jan2012/Aug2012/indexBowtie/
/ddn/gs1/project/mousemeth/reference/BL6_NISC_Jan2012/Aug2012/indexBowtie/

2. It needs to write output back to some directly where the raw files are stored, weird though
Therefore, I had to make symbolic links: 
/ddn/gs1/home/li11/project2012/ASE/ASAP/filteredFastq/

3. Here is a sample command:

/ddn/gs1/home/li11/tools/ASAP/ASAP_v0.1.2/ASAP --solexa-quals  -n 2 -l 40 --chunkmbs 8192 

--genome_1 /ddn/gs1/project/mousemeth/reference/C3H_NISC_Jan2012/Aug2012/

--index_1 /ddn/gs1/project/mousemeth/reference/C3H_NISC_Jan2012/Aug2012/indexBowtie/NISC.C3H 

--genome_2 /ddn/gs1/project/mousemeth/reference/BL6_NISC_Jan2012/Aug2012/  

--index_2 /ddn/gs1/project/mousemeth/reference/BL6_NISC_Jan2012/Aug2012/indexBowtie/NISC.BL6  

/ddn/gs1/home/li11/project2012/ASE/ASAP/filteredFastq/C3_M_3.trimfilter.1.fastq 

/ddn/gs1/home/li11/project2012/ASE/ASAP/filteredFastq/C3_M_3.trimfilter.2.fastq

The problem is that no alignment was found at all!!

Let’s try a different way for alignment between mm9 and c3h. The command is here:


/ddn/gs1/home/li11/tools/ASAP/ASAP_v0.1.2/ASAP --solexa-quals  -n 2 -l 40 --chunkmbs 8192 

--genome_1  /ddn/gs1/project/mousemeth/reference/mm9/indexBowtie_mm9assembly/

--index_1 /ddn/gs1/project/mousemeth/reference/mm9/indexBowtie_mm9assembly/mm9

--genome_2 /ddn/gs1/project/mousemeth/reference/C3H_NISC_Jan2012/Aug2012/

--index_2 /ddn/gs1/project/mousemeth/reference/C3H_NISC_Jan2012/Aug2012/indexBowtie/NISC.C3H 

/ddn/gs1/home/li11/project2012/ASE/ASAP/filteredFastq/C3_M_3.trimfilter.1.fastq 

/ddn/gs1/home/li11/project2012/ASE/ASAP/filteredFastq/C3_M_3.trimfilter.2.fastq

Mouse methylome SNP study

An email from Dr. David Fargo:

Jianying,

In addition to the polyphen2/annovar (or other SNP annotator) analysis could you count/label those SNPs that occur at:

1) C or G
2) CpG
3) The X of XCpGX
4) Those C or G SNPs that change to T or A and would thus potentially confound methyl-status calling.
5) If others have good ideas…

I would think a color scheme for the browser might be useful given the above categories (e.g. CpG SNPs are red… XCpGX SNPs are blue… whatever looks good).

Thank you,
David

A CpG island is provided by Ty Wang with first few lines like this:

#chrom	chromStart	chromEnd	length	CpG	G+C%	Obs/Exp
chr1	3521462	3522094	632	31	0.550632911	0.651329787
chr1	3659714	3663655	3941	193	0.550114184	0.650088247
chr1	4075259	4076022	763	38	0.557011796	0.650031387
chr1	4481713	4485373	3660	191	0.55	0.690346447
chr1	4486431	4487760	1329	67	0.556809631	0.650542466
chr1	4546994	4547544	550	36	0.550909091	0.86379897

But, there was one base off on the coordinates. Here is the email:

Jianying,

When I predicted the CpG islands from BL6 and C3H a few months ago, I used the genome assembly from the directory /ddn/gs1/project/mousemeth/DNAseq/jthomas/NISC_C3H_BL6_assemblies_Jan_2012/working/fasta_files/.

You are right that the length of CpG island in the table is one base short than its actual size. I am aware of that too. It’s the direct output from the software CgiHunter, and it’s consistent with the UCSC CpG Islands track which is also one base short. So I didn’t adjust the length to avoid conflict with UCSC. For your analysis, I suggest to use the coordinates instead of the length. Thanks!

Ty

From: Li, Jianying (NIH/NIEHS) [C]
Sent: Friday, November 30, 2012 11:42 AM
To: Wang, Tianyuan (NIH/NIEHS) [C]
Cc: Fargo, David (NIH/NIEHS) [E]; Bushel, Pierre (NIH/NIEHS) [E]
Subject: RE: CpG Island and RepeatMasker

Ty,

I have a couple of quick clarification questions.

Firstly, When you predicted the CpG on BL6 and C3H, you used the newly assembled genome from NISC, but not the “mm9-based BL6 and C3H”, correct?

Secondly, the length of CpG island is one base short than the region. i.e. in C3H,
chr1 3490650 3491282 632

The length should be 633 (we have to count both ends ), am I right? It will matter when I search for flanking region (in David’s later email xCpGx). In this case, the coordinates for two Xs are 3490649 and 3491283. Please advise whether my understanding is right.

Thank you for your attention!

Jianying

Configure MacBook Pro

MacBook Pro basic

Here are the list of software installed on a MacBookPro

Working with python

Mac comes with python 2.7, and python3 will invoke python 3.7
Download and install PyCharm (education version) 

It is the best practice to have python virtual environment set up properly

I need to install virtualenv with the command

Pip3 install —user virtualenv 

Then “python3 -m virtualenv venv “ will work

Need Anaconda

How to get Anacondafor Mac 
Install graphical installer
Install more software (although I have pre-installed PyCharm),
I am installing R 
Spyder is already installed but Spyder uses kite for code - completion, 
One benefit getting the Anaconda is that my python gets bumped up to python3

MacBookPro shortcut key combinations:

Here is a guy who makes this useful video on a few important shortcut key combos.
The most comprehensive would be from apple.

Get latest R installed from source

Install Ensembl VM with Oracle VirtualBox

Still struggling with the “share folder” problem. The link was not very clear.

Now, let’s take a look at the MAMP installation on OS X10.7.5, this is a good link.

To work as a root, one need “sudo bash”, give it a try.