Basic procedure for building the prediction model

The data we have will be mixture of the two (sometimes three) different populations.

  • Our goal is to fix the mixture model
  • Get critical measurement of any given sample
  • — a vector of “density/frequency/proportion” along the possible x coordinates.
  • Then use such a vector as independent variable
  • Build statistical model with clinical outcomes as dependent variable
  • Apply the model on the training data and predict on the unknown data
  • Challenges exist in this research project:

  • Out of three main clusters, information from normal group masks the other two
  • Two clinical diagnosis: OLK vs. normal, can have similar group three cell population. i.e. with D.I. > 2.3 for a few cells, or no cell with D.I. > 2.3 at all. These samples were diagnosed with further histo-pathology diagnosis
  • So, it is challenging and this can become opportunity also.
  • Can we show higher correlation between OLK and OSCC than that between normal OSCC?
  • Are the measurement we “extract” critical? Is there any statistical violation (with this analysis)?
  • Opportunities exist in this research project, and my hope:

  • Get a good fit on data from a given sample
  • Identify critical points (first or second derivative) if any, zero otherwise
  • Build SVM or other models based off these transformed data
  • Evaluate the model with cross-validate, 10-fold or leave-one-out cross validation
  • Create ROC curve for model evaluation
  • Test on independent dataset (hopefully coming soon)
  • It turns out that there are much I need to work out for this project

  • Other than the mixture model, which can not apply
  • I need to choose alternative ways
  • Fit a predominant simple model and determine the excess <– cause by other phenomenon
  • From the data, we can easily get CDF and compare the fit to the original data
  • Get the deciles, maybe?
  • Thanks go David Umbach, but the reality is it is easier to say than hand-on implementation. Need to think about it and understand it more.
  • I found a very interesting PIA at UChicago, and next I need to find what software they are using and whether it is free. It turns out that the webpage was out-of-date, and here is the new one. Talked to Lei-Ann and got great information.

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    This site uses Akismet to reduce spam. Learn how your comment data is processed.