PNAS, Dec 23, 2105
Siddharth Krishna Kumara (a, 1), Marcus W. Feldman (a), David H. Rehkopf (b), and Shripad Tuljapurkara (a)
(a) Department of Biology, Stanford University, Stanford, CA 94305-5020; and (b) School of Medicine, Stanford University, Stanford, CA 94305-5020
Genome-wide association studies (GWASs) seek to understand the relationship between complex phenotype(s) (e.g., height) and up to millions of single-nucleotide polymorphisms (SNPs). Early analyses of GWASs are commonly believed to have “missed” much of the additive genetic variance estimated from correlations between relatives. A more recent method, genome-wide complex trait analysis (GCTA), obtains much higher estimates of heritability using a model of random SNP effects correlated between genotypically similar individuals. GCTA has now been applied to many phenotypes from schizophrenia to scholastic achievement. However, recent studies question GCTA’s estimates of heritability. Here, we show that GCTA applied to current SNP data cannot produce reliable or stable estimates of heritability. We show first that GCTA depends sensitively on all singular values of a high-dimensional genetic relatedness matrix (GRM). When the assumptions in GCTA are satisfied exactly, we show that the heritability estimates produced by GCTA will be biased and the standard errors will likely be inaccurate. When the population is stratified, we find that GRMs typically have highly skewed singular values, and we prove that the many small singular values cannot be estimated reliably. Hence, GWAS data are necessarily overfit by GCTA which, as a result, produces high estimates of heritability. We also show that GCTA’s heritability estimates are sensitive to the chosen sample and to measurement errors in the phenotype. We illustrate our results using the Framingham dataset. Our analysis suggests that results obtained using GCTA, and the results’ qualitative interpretations, should be interpreted with great caution.