Learning sra toolkit

It turns out that sra toolkit is not very straightforward, it deserves the special attention into it.

Adam shared good experiences working with sra toolkit and he often time has to try different versions. Therefore, he keeps a list of sar toolkit and kindly shares within the group.

/ddn/gs1/home/burkholderab/bin/fastq-dump.2.0.3
/ddn/gs1/home/burkholderab/bin/fastq-dump.2.1.0
/ddn/gs1/home/burkholderab/bin/fastq-dump.2.1.12
/ddn/gs1/home/burkholderab/bin/fastq-dump.2.1.2
/ddn/gs1/home/burkholderab/bin/fastq-dump.2.1.6
/ddn/gs1/home/burkholderab/bin/fastq-dump.2.3.1
/ddn/gs1/home/burkholderab/bin/fastq-dump.2.3.2

Details about fastq-dump options

In the end, different versions of fastq-dump is needed, i.e.

nohup /ddn/gs1/home/burkholderab/bin/fastq-dump.2.1.12   SRR1343625.sra &
nohup /ddn/gs1/home/burkholderab/bin/fastq-dump.2.1.12   SRR1343624.sra &

nohup /ddn/gs1/home/burkholderab/bin/fastq-dump.2.3.1  SRR036616.sra &
nohup /ddn/gs1/home/burkholderab/bin/fastq-dump.2.3.1  SRR036617.sra &
nohup /ddn/gs1/home/burkholderab/bin/fastq-dump.2.3.1  SRR036622.sra &
nohup /ddn/gs1/home/burkholderab/bin/fastq-dump.2.3.1  SRR036623.sra &

nohup /ddn/gs1/home/burkholderab/bin/fastq-dump.2.3.1  SRR1343624.sra &                   
nohup /ddn/gs1/home/burkholderab/bin/fastq-dump.2.3.1  SRR1343625.sra &                   

nohup /ddn/gs1/home/burkholderab/bin/fastq-dump.2.3.1  --split-files  SRR1448790.sra &
nohup /ddn/gs1/home/burkholderab/bin/fastq-dump.2.3.1  --split-files  SRR1448791.sra &

Next, I need to figure out how to dump ERR## files

ERR375885.sra
/ddn/gs1/home/burkholderab/bin/fastq-dump.2.0.3  -X 5 -Z  ERR375885.sra
/ddn/gs1/home/burkholderab/bin/fastq-dump.2.1.0  -X 5 -Z  ERR375885.sra
/ddn/gs1/home/burkholderab/bin/fastq-dump.2.1.12  -X 5 -Z  ERR375885.sra
/ddn/gs1/home/burkholderab/bin/fastq-dump.2.1.2  -X 5 -Z  ERR375885.sra
/ddn/gs1/home/burkholderab/bin/fastq-dump.2.1.6  -X 5 -Z  ERR375885.sra
/ddn/gs1/home/burkholderab/bin/fastq-dump.2.3.1  -X 5 -Z  ERR375885.sra
/ddn/gs1/home/burkholderab/bin/fastq-dump.2.3.2  -X 5 -Z  ERR375885.sra
fastq-dump.2.4.2  -X 5 -Z  SRR375885.sra

Firstly, let’s understand what does an “ERR” stand for.

Found it at the SRA doc
It turns out easier to download the files directly from EBI
Then, I need to validate the sequence at NCBI

Comparing perl and R for implementing the FDR

Here, I am trying to implement Benjamini

Wikipedia provides the write up

The Original paper published in 1995

Perl has a module: Multtest on cpan, it implemented what implemented in an R module: p.adjust

Implementation in R is as following:

        i <- lp:1L
        o <- order(p, decreasing = TRUE)
        ro <- order(o)
        pmin(1, cummin(n/i * p[o]))[ro]

Given a sequence of example p-values in the original paper, R produces:

pval <- c(0.0001 ,0.0004, 0.0019, 0.0095, 0.0201, 0.0278, 0.0298, 0.0344, 0.0459, .03240, 0.4262, 0.5719, 0.6528, 0.7590, 1)
p.adjust (pval, "BH")
 [1] 0.00150000 0.00300000 0.00950000 0.03562500 0.05733333 0.05733333 0.05733333 0.05733333
 [9] 0.06885000 0.05733333 0.58118182 0.71487500 0.75323077 0.81321429 1.00000000

Therefore, FDR is control at 0.05. p(4) = 0.0356 < 0.05

Dispersion estimation

When dealing with count data, one consideration is the dispersion. Here is a good post on modeling count data regression in R

Here is a link on dispersion with a few kernels

A comment based on McCullagh and Nelder, with a online course from pennstate explains on the theoretical level. With a special dedication to an R session here

To estimate the dispersion, it is as simple as fit a GLM model with link function chosen accordingly

Support for Steven Kleeberger

Here are a list of aspects that I can help out

1. Microarray database handling (CEBS)
2. MARCO project — transcription binding site analysis led by Jacqui is a well studied and understood project. There is some challenges and need to be sorted out.
3. Steve is interested in Circos — an application widely accepted as a piece of art
4. Diesel exposure study
5. A project involves mitochondria sequencing, currently handled by Zach and passed onto Kirsten