An email from Dr. David Fargo:
Jianying,
In addition to the polyphen2/annovar (or other SNP annotator) analysis could you count/label those SNPs that occur at:
1) C or G
2) CpG
3) The X of XCpGX
4) Those C or G SNPs that change to T or A and would thus potentially confound methyl-status calling.
5) If others have good ideas…I would think a color scheme for the browser might be useful given the above categories (e.g. CpG SNPs are red… XCpGX SNPs are blue… whatever looks good).
Thank you,
David
A CpG island is provided by Ty Wang with first few lines like this:
#chrom chromStart chromEnd length CpG G+C% Obs/Exp chr1 3521462 3522094 632 31 0.550632911 0.651329787 chr1 3659714 3663655 3941 193 0.550114184 0.650088247 chr1 4075259 4076022 763 38 0.557011796 0.650031387 chr1 4481713 4485373 3660 191 0.55 0.690346447 chr1 4486431 4487760 1329 67 0.556809631 0.650542466 chr1 4546994 4547544 550 36 0.550909091 0.86379897
But, there was one base off on the coordinates. Here is the email:
Jianying,
When I predicted the CpG islands from BL6 and C3H a few months ago, I used the genome assembly from the directory /ddn/gs1/project/mousemeth/DNAseq/jthomas/NISC_C3H_BL6_assemblies_Jan_2012/working/fasta_files/.
You are right that the length of CpG island in the table is one base short than its actual size. I am aware of that too. It’s the direct output from the software CgiHunter, and it’s consistent with the UCSC CpG Islands track which is also one base short. So I didn’t adjust the length to avoid conflict with UCSC. For your analysis, I suggest to use the coordinates instead of the length. Thanks!
Ty
From: Li, Jianying (NIH/NIEHS) [C]
Sent: Friday, November 30, 2012 11:42 AM
To: Wang, Tianyuan (NIH/NIEHS) [C]
Cc: Fargo, David (NIH/NIEHS) [E]; Bushel, Pierre (NIH/NIEHS) [E]
Subject: RE: CpG Island and RepeatMaskerTy,
I have a couple of quick clarification questions.
Firstly, When you predicted the CpG on BL6 and C3H, you used the newly assembled genome from NISC, but not the “mm9-based BL6 and C3H”, correct?
Secondly, the length of CpG island is one base short than the region. i.e. in C3H,
chr1 3490650 3491282 632The length should be 633 (we have to count both ends ), am I right? It will matter when I search for flanking region (in David’s later email xCpGx). In this case, the coordinates for two Xs are 3490649 and 3491283. Please advise whether my understanding is right.
Thank you for your attention!
Jianying