Getting all the bolts and nuts ready

I am trying to create a workable environment for data analysis. Now on my list I have the following to be installed on my windows machine

Anaconda (maybe conda also)

I installed the Anoconda(v1.4.0) on the user level

Jupyter

I followed the link and installed jupyter
I can launch "jupyter notebook" and see the web on localhost:8888,
But, I do not see the coding environment available


Zipline

It is a little bit into it to get zipline installed. So, I tested using conda with “sub-environment”. zipline only works with python2.7!!

C:\Users\li11>conda create -n ForZipline python=2.7 biopython
(ForZipline) C:\Users\li11>conda intall -c Quantopian zipline


Using jupyter

Thank goes to our system admin and Anaconda. Both Python2 and Python3 have been installed on my windows machines. Now I can launch the “jupyter” directly. From now on, I will stick with the jupyter IDE (Interactive Development Environment).

R basic

In this post, I will document all basic for R and R programming.

Scenario 1, useful R introduction manuals and websites

	
  • CRAN
  • Thomas Girke -- USC
  • Scenario 2, differentiating matrix and dataframe

    A matrix is a two-dimensional data structure. All the elements of a matrix must be of the same type (numeric, logical, character, complex).
    A data frame combines features of matrices and lists. In fact we can think of a data frame as a rectangular list, that is, a list in which all items have the length length. The items of the list serve as the columns of the data frame, so every item within a particular column has to be of the samne type. However, different columns can be of different types. 
    Matrix -- Dataframe
    

    Scenario 3, how to combine two matrix by rownames

    It seems easy and legitimate question, but not everyone knows it. I found one pretty decent solution

    
    cbind.fill <- function(x, y){
      xrn <- rownames(x)
      yrn <- rownames(y)
      rn <- union(xrn, yrn)
      xcn <- colnames(x)
      ycn <- colnames(y)
      if(is.null(xrn) | is.null(yrn) | is.null(xcn) | is.null(ycn)) 
        stop("NULL rownames or colnames")
      z <- matrix(NA, nrow=length(rn), ncol=length(xcn)+length(ycn))
      rownames(z) <- rn
      colnames(z) <- c(xcn, ycn)
      idx <- match(rn, xrn)
      z[!is.na(idx), 1:length(xcn)] <- x[na.omit(idx),]
      idy <- match(rn, yrn)
      z[!is.na(idy), length(xcn)+(1:length(ycn))] <- y[na.omit(idy),]
      return(z)
    }
    
    

    Scenario 4, I want to have a thorough note on apply function in R

    There was a simple question on R apply function. Although it seems quite straightforward, it causes lots of confusion for people. Therefore, I decide to write a thorough document for this.


    Scenario 5, Invoking R from the command line

    Here gives a good example to invoke R from the linux command line.

    Scenario 6, Use R to perform clustering and then produce a heatmap

    Good example-by-mannheimia
    Example of savvi by Earl Glynn
    

    Installing MikTex on windows 7 64bit machine

    All the help docs deserve the credits:

    First post I encountered
    Detail installation instructions
    

    Working on the site visit report reminds me about the MikTex tools I had a year ago. And, I am learning it through this process. First of all, there are urls that help me and I would like to acknowledge their effort.

    HowToTex is very helpful, but I could not install/use the packages.
    This one looks pretty professional.
    Math course at Kansas State University is primarily for creating magic math formula.
    I started with this dummy skeleton, at least is a working example for dummy.
    A self-study manual.
    Another LaTex online book here.
    

    Using mongoDB for LINCS1000

    L1000CDS use MongoDB schema to store all the information.

    Installation

    1. Install MongoDB. Please select release 2.4.14 for download. The 
    MongoDB files on this page are not compatible with newer versions 
    of MongoDB
    
    2. Download the LINCS_L1000_CD.tar.gz file and unzip.
    
    3. Place all the LINCS_L1000_CD.x files in the unzipped folder 
    into the MongoDB database folder.
    
    4. Start mongod.
    
    5. Open a mongo shell and switch to LINCS_L1000_CD db. The LINCScloud 
    collection in the db stores all the processed LINCS L1000 data 
    download from LINCS cloud. The GSE70138 collection contains all 
    the processed LINCS L1000 data downlaoded from GEO. 
    The two collections do not overlap.
    
    6. Optionally it is suggested to download and install Robomongo 
    which provides a nice GUI to browse the data.
    
    6. Refer to the MongoDB documentation for query specifications 
    and drivers for different languages.
    

    Import and export between MongoDB and .json files

    mongoimport
    mongoexport
    will help to dump json file in and out from a mongodb
    import-export
    

    For Jacqui, temporarily stored here

    epigquickstart_508

    I have mongodb installed on my Windows10 desktop and Linux CentOS7. To installed L1000CDS on my linux and windows desktop:

    Download two files from the download page
    On linux: copy two files to /tmp/mongo/

    cpcd-gse70138.metadata.json

    and

    cpcd-gse70138.bson

    mongorestore -d databasename /tmp/mongo/ mongorestore -d databasename -c collectionname /path/to/both_json_and_bson_files In my case, I tested both ways and they all worked:

    mongorestore -d LINCS_L1000_CD -c cpc2014 /path/to/cpcd-gse70138.bson

    mongorestore -d LINCS_L1000_CD /home/li11/LINCS-L1000/

    Although both ways work, however, the second method (without collection name)

    produced a collection called “cpcd-gse70138”. This makes query very cumbersome as

    I have to deal withlittle dash. Therefore, it is with more control to do

    mongorestore -d LINCS_L1000_CD -c cpc2014 /home/li11/LINCS-L1000/cpcd-gse70138.bson

    Advanced usage of R with RMySQL

    R called inside other applications

    One thing that occasionally bothers me is how to refer a variable inside a function call, now it becomes more prominent when I try to use a parameter passed into a function from a RMySQL query. Here is the deal:

    An example in RMySQL from RBlog

    #Load the package
    library(RMySQL)
    
    # Set up a connection to your database management system.
    # I'm using the public MySQL server for the UCSC genome browser (no password)
    mychannel <- dbConnect(MySQL(), user="genome", host="genome-mysql.cse.ucsc.edu")
    
    # Function to make it easier to query 
    query <- function(...) dbGetQuery(mychannel, ...)
    
    # Get the UCSC gene name, start and end sites for the first 10 genes on Chromosome 12
    query("SELECT name, chrom, txStart, txEnd FROM mm9.knownGene WHERE chrom='chr12' LIMIT 10;")
    
    

    My own database

    mirList <- c("Adrenal", "Brainstem")
    numOfTis <- length(mirList)	
    query <- function(...) dbGetQuery(mydb, ...)
    query("select rnMiR as mir, tissue as tis from TisSpeMiRs where tissue in (\"Adrenal\", \"Brainstem\") and ctrCode = 2 limit 2 ;") ## this works
    query("select rnMiR as mir, tissue as tis from TisSpeMiRs where tissue in eval(mirList) and ctrCode = 2 limit 2 ;") ## this does NOT work!!
    

    Well, some “expert” posted Advanced R usage
    Single quote vs. double quote is here

    Learning step 1

    	
  • Option 1
  • min.v<-5 max.v<-10 cat("You entered ", "\"", min.v, " ", max.v,"\"", sep="")
  • Option 2
  • cat("You entered ", '"', min.v, " ", max.v,'"', sep="")
  • Option 3
  • options(useFancyQuotes=FALSE) cat("You entered ", dQuote(paste(min.v, max.v)), sep="")
  • Option 4
  • options(useFancyQuotes=FALSE) cat("You entered (", dQuote(paste(min.v, max.v)), ")", sep="") cat("You entered (", paste(dQuote(min.v), dQuote(max.v), sep = ",") , ")", sep="") num <- c(min.v, max.v) cat("You entered (", dQuote(paste(num)), ")", sep="") cat("You entered (", paste(num, collapse = ",") , ")", sep="") cat("You entered (", paste(dQuote(paste(num)), collapse=","), ")", sep="")

    Learning step 2

    MongoDB with Java and Python

    In order to work on project MongoDB with Java, there are a few setups

    • Install MongoDB

     

    • Make sure Maven is installed

     

    It turns out that I stick with mongo + python so I can leverage the opportunity to sharpen my python skills. 

    From here on, I will keep accumulating mongo/python related cases

    I have a mongodb dump, and I can restore it.

    	
  • To create a dump:
  • Go the directory where the "dump" exists
  • Issue command "mongorestore dump"
  • It shall restore the databse
  • I have a grade information in .json format

    cat grades.json | python -m json.tool | more
    

    Graphical Database Schema

    In order to get approval, we need a graphical diagram. Pierre suggest schemaspy. It turns out that a couple of dependencies were required:

    	
  • Apparently, SchemaSpy
  • It needs Graphviz installed
  • Also, database driver in our case JDBC-MySQL driver
  • Luckily, I have it installed on my windows machine with ColdFusion earlier, located at
  • C:\ColdFusion2016\cfusion\lib\mysql-connector-java-5.1.38-bin.jar
  • Migrate mysql DB from linux (wine) to windows desktop. Before doing this, I needed to drop all tables in the database first. It turns out a scripting way works for linux. For windows, it is easier to drop the database and re-create the database.

    [li11@ehscmplp11/wine ~]$ mysqldump -u li11 -ppassword mirDB > ~/mirDB.sql
    C:\Users\li11>mysql -u root -ppassword ratemirs < x:\mirDB.sql
    

    My command to run SchemaSpy was:

    java -jar Downloads\schemaSpy_5.0.0.jar -dp C:\ColdFusion2016\cfusion\lib\mysql-connector-java-5.1.38-bin.jar -t mysql -db RATEmiRs -host localhost -u li11 -p nopassword -o X:\project2016\microRNADB\diagram\

    Interesting enough, the command involves both “windows” local and “network” drive.

    10/18/2016

    I did not know what was going wrong, but when I tried the same command, it failed then became okay. Just to watch out for it.