My research is focused on the development of computational and statistical methods for extracting meaningful information from the high-throughput sequencing data in cancer and/or immunology. I am interested in understanding the complicated structure and the biological background of the data, and adaptively developing new methods to utilize the data structure and biological background to increase the analysis power. My research sits at the intersection of Statistics, Machine Learning,  Bioinformatics, and Genetics and Genomics/Cancer/Immunology. I enjoy collaborating with both biomedical and quantitative investigators, and involving students in those projects.

Aside from high-throughput sequencing data, I have also collaborated actively in analyzing EHR data and developing longitudinal models to compare health utilities and treatment outcomes.

Multi-Scale Inference for Data with Hierarchical Structures

Many data have underlying hierarchical structures. For example, microbiome data have underlying evolutionary structure that can be modeled by phylogenetic trees; single-cell data can also be structured with cell-type dendrograms. We have been working on large-scale multiple testing methods that incorporate the underlying hierarchical structure in the analysis so that the analysis power can be greatly boosted.

Elucidating High-Dimensional Gene Co-Expression Networks

One area of my research is to develop large-scale multiple testing methods for elucidating high dimensional gene co-expression networks, while additional information or structures are available. For example, we have developed methods for data with both gene expression data and genetic markers (such as eQTL data), and gene expression data collected for multiple groups. We have also worked out a robust semi-parametric large-scale dependence testing method allowing adjusting for low-dimensional or high-dimensional covariants. The method performs well while data are contaminated, heavy-tailed, with unknown distributions, or with unknown confounders.

Machine Learning for Genetic and Genomics Data in Cancer Immunology

I work actively with members at Duke Cancer Institute and Duke Center of Human Immunology. We are currently developing machine learning methods to perform large-scale integrative genomic analysis. The goal is to gain a better understanding towards the personalized treatment effect of cancer treatments,  survival under treatments, and health disparities.