The data we have will be mixture of the two (sometimes three) different populations.
Our goal is to fix the mixture model Get critical measurement of any given sample — a vector of “density/frequency/proportion” along the possible x coordinates. Then use such a vector as independent variable Build statistical model with clinical outcomes as dependent variable Apply the model on the training data and predict on the unknown data
Challenges exist in this research project:
Out of three main clusters, information from normal group masks the other two Two clinical diagnosis: OLK vs. normal, can have similar group three cell population. i.e. with D.I. > 2.3 for a few cells, or no cell with D.I. > 2.3 at all. These samples were diagnosed with further histo-pathology diagnosis So, it is challenging and this can become opportunity also. Can we show higher correlation between OLK and OSCC than that between normal OSCC? Are the measurement we “extract” critical? Is there any statistical violation (with this analysis)?
Opportunities exist in this research project, and my hope:
Get a good fit on data from a given sample Identify critical points (first or second derivative) if any, zero otherwise Build SVM or other models based off these transformed data Evaluate the model with cross-validate, 10-fold or leave-one-out cross validation Create ROC curve for model evaluation Test on independent dataset (hopefully coming soon)
It turns out that there are much I need to work out for this project
Other than the mixture model, which can not apply I need to choose alternative ways Fit a predominant simple model and determine the excess <– cause by other phenomenon From the data, we can easily get CDF and compare the fit to the original data Get the deciles, maybe? Thanks go David Umbach, but the reality is it is easier to say than hand-on implementation. Need to think about it and understand it more.
I found a very interesting PIA at UChicago, and next I need to find what software they are using and whether it is free. It turns out that the webpage was out-of-date, and here is the new one. Talked to Lei-Ann and got great information.