Integrated Analysis of RCTs and Population-based Studies

To accommodate five speakers, the ending time of this session is extended to 12:45pm ET.

Organizers: Shu Yang (NCSU)
Chair: Robert Barrier (PAREXEL)
Vice Chair: Shu Yang (NCSU)

Sivakumar Gowrisankar (PAREXEL)
Douglas Faries (Eli Lilly)
Shu Yang (NCSU)
Peisong Han (UMich)
Abhishek Chakrabortty (Texas A&M)


Title: Polygenic risk scores to predict treatment response and disease outcomes in randomized control trials and non-interventional studies
Speaker: Sivakumar Gowrisankar (PAREXEL))

Complex multigenic diseases often have several associated genetic risk variants each explaining only a small proportion of variance exhibited by the condition. Similarly, identifying variant signatures that modulate treatment response is of high importance in clinical trials, especially with respect to patient stratification, dosing, and setting inclusion/exclusion criteria. It is especially difficult to reliably identify and quantify variants with small genetic effects through traditional GWAS analysis in the absence of large datasets.

Polygenic risk scores (PRS) have been used to combine multiple putative risk alleles to generate risk scores that are more predictive of the disease or treatment outcomes. Often, when calculating PRS scores it is desirable to relax the p-value threshold to include all variants that contribute meaningfully to variation in disease or treatment outcome, even if the variants are not individually statistically significant. In order to obtain reliable effect sizes for all these variants, we rely on large, population-based studies. Finally, we calculate the best PRS estimate using subset of these variants that robustly predicts the outcome of interest.

In this presentation, we will discuss details of the above approach and outline how we have systematically implemented a PRS pipeline using large population-based studies or datasets to a) understand the genetic burden of disease and b) randomized control trial (RCT) or non-interventional trial to predict treatment outcomes. We will provide a few case- examples to showcase how this method is applied.

Title: Full and Hybrid Real-World Control Arms:  Evaluating the use of Real-World Data as Controls in RCTs
Speaker: Douglas Faries (Eli Lilly)

Real-world data has been a source for creating external control arms to evaluate results from randomized controlled trials (RCTs) in rare diseases and scenarios where randomization to a control group is unethical/unfeasible.  In rare cases, real-world data has even been used as a control arm for single arm trials to support regulatory decisions regarding efficacy.  However, the validity of any decision making based on such comparative results depends heavily on the appropriateness and quality of the control arm data. In fact, FDA guidance1 lists multiple concerns with the use of real-world controls including selection bias, unmeasured confounding, time bias, data quality, and outcome validity.

In this presentation we discuss the hybrid control design of enrolling an RCT with a full treatment group and a small underpowered control group, and supplementing the RCT controls with external controls from real-world data.  The potential benefits are 1) ability to establish the validity of the supplemental real-world control data; 2) reduced time and costs relative to a fully powered RCT.  Different analytic methods for hybrid control designs will be discussed, including test-then-pool, multi-step matching and regression adjustment, and Bayesian power prior and meta-analytic approaches.  We present a simulation study to evaluate the operating characteristics of full and hybrid real world control methods across scenarios based on the 5 types of data validity concerns mentioned in the FDA guidance.  Results show that certain hybrid methods can adjust for potential biases from the use of real-world controls but may come at a price of reduced efficiency through larger standard errors.  Implications for the use of such methods and suggestions for additional work will be discussed.   This is a joint work with Mingyang Shan, Andy Dang, and Zhanglin Lin Cui.

Title: Statistical methods for improving randomized clinical trial analysis with integrated information from real world evidence studies
Speaker: Shu Yang (NCSU)

In this talk, we leverage the complementing features of randomized clinical trials (RCT) and real world evidence (RWE) to estimate the average treatment effect of the target population. First, we propose a calibration weighting estimator that uses only covariate information from the RWE study. Because this estimator enforces the covariate balance between the RCT and RWE study, the generalizability of the trial-based estimator is improved. We further propose a doubly robust augmented calibration weighting estimator that can be applied in the event that treatment and outcome information is also available from the RWE study. This estimator achieves the semiparametric efficiency bound derived under the identification and outcome mean function transportability assumptions when the nuisance models are correctly specified. A data-adaptive nonparametric sieve method is provided as an alternative to the parametric approach. The sieve method guarantees good approximation of the nuisance models. We establish asymptotic results under mild regularity conditions, and confirm the finite sample performances of the proposed estimators by simulation experiments. We apply our proposed methods to estimate the effect of adjuvant chemotherapy in early-stage resected non–small-cell lung cancer integrating data from a RCT and a sample from the National Cancer Database.

Title: Integrating Information from Existing Risk Prediction Models with No Model Details
Speaker: Peisong Han (UMich)

Consider the setting where (i) individual-level data are collected to build a regression model for the association between observing an event of interest and certain covariates, and (ii) some risk calculators predicting the risk of the event using less detailed covariates are available, possibly as black boxes with little information available about how they were built. We propose a general empirical-likelihood-based framework to integrate the rich auxiliary information contained in the calculators into fitting the regression model in order to improve the efficiency for the estimation of regression parameters. As an application, we study the dependence of the risk of high grade prostate cancer on both conventional risk factors and newly identified biomarkers by integrating information from the Prostate Biopsy Collaborative Group (PBCG) risk calculator, which was built based on conventional risk factors alone. This is joint work with Jeremy Taylor and Bhramar Mukherjee.

Title: High Dimensional Semi-Supervised Regression: Robust Sparsity-Free Inference
Speaker: Abhishek Chakrabortty (TAMU)

Semi-supervised (SS) inference has received considerable attention in recent times. Here, apart from a small/moderate sized supervised data S, one also has a much larger sized unsupervised data U available, and it is natural to ask when and how the information from U can be exploited. Such settings arise naturally when the covariates are easily available but the response is expensive and/or difficult to obtain, a scenario frequently encountered in modern studies involving large databases.
We consider high dimensional (HD) inference for linear regression under SS settings, where the covariate dimension can diverge (possibly faster) with the size of S, but is still smaller than the size of U (available in plenty). Our framework requires no assumptions of a true linear model, nor any direct sparsity conditions on the HD parameter. We investigate the possibility of ‘improved’ SS inference that can alleviate the stringent sparsity conditions known to be unavoidable for root-n-rate inference on low dimensional components of the HD parameter in supervised settings. We demonstrate a seamless imputation-based construction of a family of SS estimators that naturally possess asymptotic linear expansions (ALEs) that facilitate inference via Gaussian approximation, and yet, do not directly require any sparsity assumptions on the parameter or the precision matrix, all of which are typical requirements in supervised HD inference. In particular, we show that it is possible to achieve a root-n-rate ALE even when the HD parameter is fully dense, and also match the ALE of the supervised Debiased Lasso but under much weaker sparsity assumptions. We also obtain the semi-parametric optimal ALE. Overall, we show the remarkable benefits of SS settings in facilitating HD inference under minimal conditions and breaking known sparsity barriers.