Tackle the problem with methylKit

It seems like a promising R package on this.

Installation needs a little more attention. Here are steps in details:


#Dependencies
install.packages("data.table")
source("http://www.bioconductor.org/biocLite.R")
biocLite("GenomicRanges")

#Installation
download.file("http://methylkit.googlecode.com/files/methylKit_0.5.7.tar.gz",destfile="methylKit_0.5.7.tar.gz")
install.packages("methylKit_0.5.7.tar.gz",repos=NULL,type="source")
unlink("methylKit_0.5.7.tar.gz")

Well, two hurdles were blocking me

1. I can't get the CpG island annotation for mouse from UCSC, when I followed the instruction :

For CpG island annotation, select "Expression and Regulation" from the "group" drop-down menu. Following that, select "CpG islands" from the "track" drop-down menu. Select "BED- browser extensible data" for the "output format". Click "get output" and on the following page click "get BED" without changing any options. save the output as a text file.
2. Our .sam file was not properly sorted. One extra step is need which is grep -v '^[[:space:]]*@' raw.sam | sort -k3,3 -k4,4n > sorted.sam 3. It turns out that this "sorting" is taking too much of the /tmp/ space to the extend that no server is designed to handle this problem, except wine. Thanks go to Frank who allows me to access wine2 temporarily. Both servers work with this just fine.

It turns out that we have three separate libraries and we have both raw and de-duplicated .sam files. So, firstly I need to combine those three files before anything can be done further. Solutions though

Basically, I need to

Generate deduplicated, merged-library, Picard-sorted & reordered SAM files for each animal

Let’s try samtools

  • Convert .sam to .bam: samtools view -S in.sam -bo out.bam
  • Sort .bam file: samtools sort out.bam out.bam.sorted
  • Merge multiple .bam files: samtools merge merged.bam 1.out.bam.sorted 2.out.bam.sorted 3.out.bam.sorted
  • Sort merged .bam file: samtools sort merged.bam merged.sorted.bam

Now, let’s take a look at picard tools

  • Convert .sam to .bam:
  • Merge multiple .bam files
  • sort them

What about unix command, a quite simple unix solution

  • Remove headers
  • tail -n +42 B6_M_1.L2x4.mm9.raw_bismark_deduplicated.sam temp_B6_M_1.L2x4.mm9.raw_bismark_deduplicated.sam

    tail -n +51 B6_M_1.L3x13.mm9.raw_bismark_deduplicated.sam temp_B6_M_1.L3x13.mm9.raw_bismark_deduplicated.sam

  • Concatenate them
    cat B6_M_1.L1x2.mm9.raw_bismark_deduplicated.sam temp_B6_M_1.L2x4.mm9.raw_bismark_deduplicated.sam temp_B6_M_1.L3x13.mm9.raw_bismark_deduplicated.sam > B6_M_1.merged.sam
  • Convert and sort them with picard tools
    picard-tools-1.42/SortSam.jar INPUT=hello.sam OUTPUT=hello.sorted.sam CREATE_INDEX=false SO=coordinate COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=7500000 TMP_DIR=.
    picard-tools-1.42/SortSam.jar INPUT=hello.sorted.sam OUTPUT=hello.sorted.bam CREATE_INDEX=true SO=coordinate COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=7500000 TMP_DIR=.