The Polygenic Index (PGI) Repository
The Polygenic Index (PGI) Repository is an initiative that makes PGIs for a wide range of traits available for a number of datasets that may be useful to social scientists. By constructing the PGIs ourselves and making them available as variables downloadable from the data providers, our resource eliminates a number of roadblocks for researchers who would like to use PGIs in their research:
1. Constructing PGIs from individual genotype data can be a time-consuming process, even for researchers trained to work with large datasets.
2. Since the prediction accuracy of a PGI is increasing in the sample size of the underlying GWAS, it is generally desirable to generate PGI weights from GWAS summary statistics based on the largest available samples. However, privacy and IRB restrictions often create administrative hurdles that limit access to summary statistics and force researchers to trade off the benefit of summary statistics from a larger sample against the costs of overcoming the hurdles.
3. Publicly available GWAS summary statistics are sometimes based on a discovery sample that includes the target cohort (or close relatives of cohort members) in which the researcher wishes to produce the PGI. Such sample overlap causes overfitting, which can lead to highly misleading results.
4. Because different researchers construct PGIs from GWAS summary statistics using different methodologies, it is hard to compare and interpret results from different studies.
The Repository currently contains single- and/or multi-trait (MTAG) PGIs for 47 phenotypes in 11 datasets. To maximize prediction accuracy of the PGIs, we meta-analysed summary statistics from multiple sources, including several novel large-scale GWASs conducted in UK Biobank and the personal genomics company 23andMe. Therefore, almost all PGIs in our initial release perform at least as well as currently available PGIs in terms of prediction accuracy. Please see Becker et al. (2021) and the User Guide for a detailed description of the pipeline.
The Repository will be updated regularly with additional PGIs and datasets. If you are interested in participating in the Repository, please reach out to firstname.lastname@example.org.
Frequently Asked Questions (FAQs)
For a less technical description of the paper and of how PGIs should—and should not—be interpreted and used, see these frequently asked questions.
PGI Access Procedures
PGIs in the participating datasets can be accessed via the procedures described here.
Summary Statistics and PGI Weights
For each phenotype in the Repository, we report GWAS and MTAG summary statistics and PGI (LDpred) weights for all SNPs from the largest discovery sample for that analysis, unless the sample includes 23andMe. SNP-level summary statistics from analyses based entirely or in part on 23andMe data can only be reported for up to 10,000 SNPs. Therefore, if the largest GWAS or MTAG analysis for a phenotype includes 23andMe, we report summary statistics for only the genome-wide significant SNPs from that analysis. In addition, we report summary statistics for all SNPs from the largest GWAS or MTAG analysis excluding 23andMe. These data can be found here.
In Becker et al. (2021), we also propose an approach that improves the interpretability and comparability of research results based on PGIs: to use in place of ordinary least squares (OLS) regression, we derive an estimator that corrects for the errors-in-variables bias. The estimator produces coefficients in units of the standardized additive SNP factor, which has a more meaningful interpretation than units of some particular PGI. The Python command-line tool implementing the estimator can be found here.
The code for constructing PGIs and principal components can be found here.
The code for the measurement error estimator illustrative application can be found here.
The code for analyzing the data in Figure 1 (Type of study in presentations at Behavior Genetics Association Annual Meetings) can be found here.
Please include the following citation in any publication based on the Repository PGIs (along with the citations for the GWAS included in the single-trait or multi-trait input GWAS for the PGI) or the measurement error corrected estimator:
Becker, J., Burik, C.A.P., Goldman, G., Wang, N., Jayashankar, H., Bennett, M., Belsky, D.W., Karlsson Linnér, R., Ahlskog, R., Kleinman, A., Hinds, D.A., 23andMe Research Group, Caspi, A., Corcoran, D.L., Moffitt, T.E., Poulton, R., Sugden, K., Williams, B.S., Harris, K.M., Steptoe, A., Ajnakina, O., Milani, L., Esko, T., Iacono, W.G., McGue, T., Magnusson, P.K.E., Mallard, T.T., Harden, K.P., Tucker-Drob, E.M., Herd, P., Freese, J., Young, A., Beauchamp, J.P., Koellinger, P.D., Oskarsson, S., Johannesson, M., Visscher, P.M., Meyer, M.N., Laibson, D., Cesarini, D., Benjamin, D.J., Turley, P., and Okbay, A. (2021). Resource Profile and User Guide of the Polygenic Index Repository. Nature Human Behaviour. Published online June 17. doi:10.1038/s41562-021-01119-3.