Mouse methylome SNP study

An email from Dr. David Fargo:

Jianying,

In addition to the polyphen2/annovar (or other SNP annotator) analysis could you count/label those SNPs that occur at:

1) C or G
2) CpG
3) The X of XCpGX
4) Those C or G SNPs that change to T or A and would thus potentially confound methyl-status calling.
5) If others have good ideas…

I would think a color scheme for the browser might be useful given the above categories (e.g. CpG SNPs are red… XCpGX SNPs are blue… whatever looks good).

Thank you,
David

A CpG island is provided by Ty Wang with first few lines like this:

#chrom	chromStart	chromEnd	length	CpG	G+C%	Obs/Exp
chr1	3521462	3522094	632	31	0.550632911	0.651329787
chr1	3659714	3663655	3941	193	0.550114184	0.650088247
chr1	4075259	4076022	763	38	0.557011796	0.650031387
chr1	4481713	4485373	3660	191	0.55	0.690346447
chr1	4486431	4487760	1329	67	0.556809631	0.650542466
chr1	4546994	4547544	550	36	0.550909091	0.86379897

But, there was one base off on the coordinates. Here is the email:

Jianying,

When I predicted the CpG islands from BL6 and C3H a few months ago, I used the genome assembly from the directory /ddn/gs1/project/mousemeth/DNAseq/jthomas/NISC_C3H_BL6_assemblies_Jan_2012/working/fasta_files/.

You are right that the length of CpG island in the table is one base short than its actual size. I am aware of that too. It’s the direct output from the software CgiHunter, and it’s consistent with the UCSC CpG Islands track which is also one base short. So I didn’t adjust the length to avoid conflict with UCSC. For your analysis, I suggest to use the coordinates instead of the length. Thanks!

Ty

From: Li, Jianying (NIH/NIEHS) [C]
Sent: Friday, November 30, 2012 11:42 AM
To: Wang, Tianyuan (NIH/NIEHS) [C]
Cc: Fargo, David (NIH/NIEHS) [E]; Bushel, Pierre (NIH/NIEHS) [E]
Subject: RE: CpG Island and RepeatMasker

Ty,

I have a couple of quick clarification questions.

Firstly, When you predicted the CpG on BL6 and C3H, you used the newly assembled genome from NISC, but not the “mm9-based BL6 and C3H”, correct?

Secondly, the length of CpG island is one base short than the region. i.e. in C3H,
chr1 3490650 3491282 632

The length should be 633 (we have to count both ends ), am I right? It will matter when I search for flanking region (in David’s later email xCpGx). In this case, the coordinates for two Xs are 3490649 and 3491283. Please advise whether my understanding is right.

Thank you for your attention!

Jianying

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.