SGDP data


The Simons Genome Diversity Project (SGDP) dataset is made possible by the Simons Foundation. The dataset contains complete genome sequences from more than one hundred diverse human populations. It is the largest dataset of diverse, high quality human genome sequences ever reported. To represent as much anthropological, linguistic, and cultural diversity as possible, the dataset includes many deeply divergent human populations that are not well-represented in other datasets.

Distribution of the data

The SGDP public project contains Open Access whole genome sequencing data for 279 samples.


By geographical regions, the SGDP dataset is comprised of 44 Africans, 22 Native Americans, 27 Central Asians or Siberians, 47 East Asians, 25 Oceanians, 39 South Asians and 75 West Eurasians.

SGDP metadata

Learn more about SGDP metadata:

  1. Access the Nature article about SGDP.
  2. Look under Excel files.
  3. Select Supplementary Table 1. Note that this will start a download for a local copy of the spreadsheet.
  4. Open your local version of the spreadsheet and filter for X in Column G. This displays all the Open Access data in the SGDP which CGC has made available in their Simons Genome Diversity Project (SGDP) public project.

Access SGDP data

Access a repository of SGDP files via the SGDP public project.

Note that you cannot currently query the SGDP dataset via the Data Browser.