Monthly Archives: September 2014
Samseq
This is a competition to our EPIGseq, but it is very intuitive that researchers have shifted their focus from microarray to RNAseq. I am learning how it works and read the research paper by Li & Tibshirani
There is a rough but very interesting webportal in Japan, which lists all the R packages/functions at CRAN, R release, and Bioconductor
Comparing perl and R for implementing the FDR
Here, I am trying to implement Benjamini
Wikipedia provides the write up
The Original paper published in 1995
Perl has a module: Multtest on cpan, it implemented what implemented in an R module: p.adjust
Implementation in R is as following:
i <- lp:1L o <- order(p, decreasing = TRUE) ro <- order(o) pmin(1, cummin(n/i * p[o]))[ro]
Given a sequence of example p-values in the original paper, R produces:
pval <- c(0.0001 ,0.0004, 0.0019, 0.0095, 0.0201, 0.0278, 0.0298, 0.0344, 0.0459, .03240, 0.4262, 0.5719, 0.6528, 0.7590, 1) p.adjust (pval, "BH") [1] 0.00150000 0.00300000 0.00950000 0.03562500 0.05733333 0.05733333 0.05733333 0.05733333 [9] 0.06885000 0.05733333 0.58118182 0.71487500 0.75323077 0.81321429 1.00000000
Therefore, FDR is control at 0.05. p(4) = 0.0356 < 0.05
Dispersion estimation
When dealing with count data, one consideration is the dispersion. Here is a good post on modeling count data regression in R
Here is a link on dispersion with a few kernels
A comment based on McCullagh and Nelder, with a online course from pennstate explains on the theoretical level. With a special dedication to an R session here
To estimate the dispersion, it is as simple as fit a GLM model with link function chosen accordingly