Basic procedure for building the prediction model

Posted on October 22, 2013 by Jin Tong

The data we have will be mixture of the two (sometimes three) different populations.

Our goal is to fix the mixture model

Get critical measurement of any given sample

— a vector of “density/frequency/proportion” along the possible x coordinates.

Then use such a vector as independent variable

Build statistical model with clinical outcomes as dependent variable

Apply the model on the training data and predict on the unknown data

Challenges exist in this research project:

Out of three main clusters, information from normal group masks the other two

Two clinical diagnosis: OLK vs. normal, can have similar group three cell population. i.e. with D.I. > 2.3 for a few cells, or no cell with D.I. > 2.3 at all. These samples were diagnosed with further histo-pathology diagnosis

So, it is challenging and this can become opportunity also.

Can we show higher correlation between OLK and OSCC than that between normal OSCC?

Are the measurement we “extract” critical? Is there any statistical violation (with this analysis)?

Opportunities exist in this research project, and my hope:

Get a good fit on data from a given sample

Identify critical points (first or second derivative) if any, zero otherwise

Build SVM or other models based off these transformed data

Evaluate the model with cross-validate, 10-fold or leave-one-out cross validation

Create ROC curve for model evaluation

Test on independent dataset (hopefully coming soon)

It turns out that there are much I need to work out for this project

Other than the mixture model, which can not apply

I need to choose alternative ways

Fit a predominant simple model and determine the excess <– cause by other phenomenon

From the data, we can easily get CDF and compare the fit to the original data

Get the deciles, maybe?

Thanks go David Umbach, but the reality is it is easier to say than hand-on implementation. Need to think about it and understand it more.

I found a very interesting PIA at UChicago, and next I need to find what software they are using and whether it is free. It turns out that the webpage was out-of-date, and here is the new one. Talked to Lei-Ann and got great information.

Protected: Rat microRNA body map project

Posted on October 21, 2013 by Jin Tong

Mixture model by a Canadian

Posted on October 11, 2013 by Jin Tong

I think that I had found the original post from a Canadian group.

Besides Peter McDonald and his contribution to the MIX software, I found out a Juan Du Master’s student’s thesis. And there is a mixdist R package

I am amazed to find out that mixture of models have been intensively study by so many researchers and for so many years! David Dowe. It is often an paradox, how much deep can an individual go into. I guess that is how an person can become an expert.

Now, with the most superficial approach, I need to clear out some basic road blocks:

Detail properties of normal and gamma distribution

How gamma becomes a normal distribution

Chi-square test on goodness-of-fit and degree of freedom

With the general MIX program, it fits a set of data with “mixparameters” and proposed “kernels”, then it come back with a fit to the “histogram”, with chi-square test on the fitting. It should report parameters of those distributions that make up the mixture data. It might sounds like a good approach:

With a set of data coming from a mixture distributions

Fit with MIX or mixdist and assess the fit with Chi-square test

Pick whichever winners and extract the parameters from those distributions

Then, restore the mixture distribution with known parameters (and/or the proportions??)

In the end, take the derivatives (second) and finish the data transformation

The next topic will be SVM or any other clustering procedures for modeling and building the prediction model.

Great post from Stowers Institute

Posted on October 11, 2013 by Jin Tong

This is the second time that I am writing about my learning experience from Earl F. Glynn at Stowers Institute. Unfortunately, it is not maintained since Oct. 2008. Well, I learned a lot from Earl, and would like to give my thankfulness to this website. Still, this is a good website about Gaussian property

Biomodal project master post

Posted on October 3, 2013 by Jin Tong

With the current bimodal project, I have listed out a few important components and maybe plan to tackle the project.

Plotting histogram and overlap with fitting

Fitting different models to data

Get mixed models working

Get Bayesian’s models working

Extract parameters from models

Bayesian modeling on the bimodal distribution

Posted on October 2, 2013 by Jin Tong

This paper was published on
Journal of Statistical Planning and Inference in 2009.

The major data source was from the Old Faithful.

Protected: Project meeting with Yao and Luke