Learning ggplot

ggplot manual is available

Goal 1: ggplot without grid — someone has the post
Goal 2: Modify with no legend
Goal 3: Add x/y labels

Goal 4: Plot SNP panels with multiple samples

Do some researching and found this post is very helpful and can be used as the starting point

Goal 5: save a vector figure, .emf

p <- ggplot (someObject)
usage: ggsave ("someFigure.emf", p)

Working with EMMA

It has been a while, but I need to make sure that EMMA works for Dr. Kleeberger’s project

Understand the population genetics

A hands on tutorial here.

Understand the Mixed model

plink provides many good tools.

Implementation with R — I encountered some hurdles with R programming and would like to document them here before I forget.

If fctr.cols are the names of your factor columns, to convert
them to characters

X[, fctr.cols] <- sapply(X[, fctr.cols], as.character)

 

distances <- matrix(1:25, nrow=5, ncol=5)
Now, 
apply(distances, c(1, 2), function(x) 0)
distances[] <- 0L
distances*0

Viewing/reporting the results

I found a very useful and handy help doc on Manhattan Plot

GWAS using EMMA

A good note on using EMMA

Or, I can get EMMAx from Ed Burk’s GAPIT bundle.
Here FastMap has moved to a new location
Where is snpster??

Fact about EMMA

 

  • Kang,H extended EMMA to EMMAX

 

  • Zhou, present the EMMA in GEMMA. It seems to be very similar to EMMA without citing EMMA’s work.
  • GenABEL seems to quite popular in the community.

 

 

FastMap note

I need to take a note on this

Help Heather with PCA

Hi Jianying:

I hope you are doing well and having a nice week so far.  I hate to
 bother you with this question, but depending on whether I will have 
the data to do so, I may be running a principal component analysis 
on some past data from the lab.  I need to speak with Dr. Kleeberger
tomorrow morning so that I can clarify some things, but if possible,
would you possibly be available to meet sometime tomorrow afternoon or 
anytime on Friday so that I can discuss the principal component 
analysis?  I just want to be sure I do it correctly and I would feel 
better knowing I had your expertise.  If you do not have the time, 
though, please do not worry and know that I understand .  
Thank you for your consideration, regardless.

Sincerely,
Heather

Well, I found a very good lecture note from PennState on Principal Component Analysis

Getting all the bolts and nuts ready

I am trying to create a workable environment for data analysis. Now on my list I have the following to be installed on my windows machine

Anaconda (maybe conda also)

I installed the Anoconda(v1.4.0) on the user level

Jupyter

I followed the link and installed jupyter
I can launch "jupyter notebook" and see the web on localhost:8888,
But, I do not see the coding environment available


Zipline

It is a little bit into it to get zipline installed. So, I tested using conda with “sub-environment”. zipline only works with python2.7!!

C:\Users\li11>conda create -n ForZipline python=2.7 biopython
(ForZipline) C:\Users\li11>conda intall -c Quantopian zipline


Using jupyter

Thank goes to our system admin and Anaconda. Both Python2 and Python3 have been installed on my windows machines. Now I can launch the “jupyter” directly. From now on, I will stick with the jupyter IDE (Interactive Development Environment).

R basic

In this post, I will document all basic for R and R programming.

Scenario 1, useful R introduction manuals and websites

	
  • CRAN
  • Thomas Girke -- USC
  • Scenario 2, differentiating matrix and dataframe

    A matrix is a two-dimensional data structure. All the elements of a matrix must be of the same type (numeric, logical, character, complex).
    A data frame combines features of matrices and lists. In fact we can think of a data frame as a rectangular list, that is, a list in which all items have the length length. The items of the list serve as the columns of the data frame, so every item within a particular column has to be of the samne type. However, different columns can be of different types. 
    Matrix -- Dataframe
    

    Scenario 3, how to combine two matrix by rownames

    It seems easy and legitimate question, but not everyone knows it. I found one pretty decent solution

    
    cbind.fill <- function(x, y){
      xrn <- rownames(x)
      yrn <- rownames(y)
      rn <- union(xrn, yrn)
      xcn <- colnames(x)
      ycn <- colnames(y)
      if(is.null(xrn) | is.null(yrn) | is.null(xcn) | is.null(ycn)) 
        stop("NULL rownames or colnames")
      z <- matrix(NA, nrow=length(rn), ncol=length(xcn)+length(ycn))
      rownames(z) <- rn
      colnames(z) <- c(xcn, ycn)
      idx <- match(rn, xrn)
      z[!is.na(idx), 1:length(xcn)] <- x[na.omit(idx),]
      idy <- match(rn, yrn)
      z[!is.na(idy), length(xcn)+(1:length(ycn))] <- y[na.omit(idy),]
      return(z)
    }
    
    

    Scenario 4, I want to have a thorough note on apply function in R

    There was a simple question on R apply function. Although it seems quite straightforward, it causes lots of confusion for people. Therefore, I decide to write a thorough document for this.


    Scenario 5, Invoking R from the command line

    Here gives a good example to invoke R from the linux command line.

    Scenario 6, Use R to perform clustering and then produce a heatmap

    Good example-by-mannheimia
    Example of savvi by Earl Glynn