The Ashkenazi Genome Consortium

The Ashkenazi Genome Consortium (TAGC)

Introduction

The TAGC consortium was founded in 2011 by 11 labs from the NY area and Israel, with the goal of generating a comprehensive catalog of genetic variation in the Ashkenazi Jewish population. To achieve this goal, we have generated high-coverage, whole genome sequences for 128 healthy individuals of Ashkenazi Jewish descent. Sequencing was completed in 2013, and the analysis is described in our paper in Nature Communications (2014). The consortium has since expanded (to include ours and other labs), and we are now sequencing a few hundreds of additional genomes.

See below for a semi-popular description.

See the consortium’s official website, including media coverage.

Utility

Our dataset is the only publicly-available extensive sequencing data for the Ashkenazi Jewish population. As such, we expect it to be of utmost importance across the entire spectrum of genetic studies in this population. Specific applications include, for example, refinement of carrier screening panels, interpretation of clinical genomes, imputation of missing variants in sparsely genotyped samples, IBD mapping, demographic inference, evaluation of the mutation burden, and detection of selection. Our sequencing dataset is also expected to be useful for method development or comparative analysis in statistical and population genetics studies in other populations, in particular isolated ones.

Data

All individuals were controls in previous association studies:

  1. Longevity (Gil Atzmon‘s lab; Albert Einstein College of Medicine; study description), n=74.
  2. Parkinson’s disease (Lorraine Clark‘s lab; Columbia University Medical Center; study description), n=54.

Whole-genome, high-coverage (>50x) sequencing was performed by Complete Genomics. Ashkenazi Jewish ancestry was validated by PCA. The genomes were called using CG pipelines 2.0.2.26 and 2.0.4.14. Both pipelines mapped variants to reference genome version hg19. masterVar files were extracted from the relevant ASM directories and CGATools‘s mkvcf command was used to merge the genomes into a VCF file. Cleaned genotypes in Plink format were generated using standard filters (see our paper).

The data an be accessed through the European Genome-phenome Archive.


More on the project and the analysis results (semi-popular)

Introduction

Ashkenazi Jews have first appeared as recently as a thousand years ago, their origins unknown. From small communities in Western-Europe, they expanded throughout Eastern-Europe and reached millions, leaving their mark on European life. These days, Ashkenazi Jews number about 10 million, living mostly in the United States and Israel. For many decades, the Ashkenazi population has attracted geneticists, due to the high prevalence of a number genetic disorders that are extremely rare elsewhere (for example, Tay-Sachs disease and Gaucher’s disease) and the pioneering adoption of carrier screening. Geneticists have also been captivated by the potential of genetic studies to elucidate Ashkenazi origins, which have been heatedly debated.

Initial genetic surveys in the Ashkenazi population, which used a handful of DNA markers, demonstrated combinations of mutations that were typical to the Jewish population. Improved technology increased the number of available markers to about a million, and a few hundreds of Ashkenazi genomes were analyzed. The results showed, unequivocally, that Ashkenazi Jews are, genetically, much closer to other Jewish populations than to Europeans or Middle-Easterners. The analysis also revealed that the Ashkenazi population size was extremely small (a so-called “founder event”) in the late Middle-Ages, consistent with the high prevalence of unique genetic disorders.

Motivation

Findings of previous studies represented a major advance in our understanding of Ashkenazi genetics. Importantly, they also suggested that in addition to mutations responsible for rare genetic disorders, other mutations, which increase the risk for common diseases, had also risen to considerable frequencies in the Ashkenazi population. Such conditions are a boon for geneticists, who can detect the effect of those mutations with a relatively minor effort. This has proven useful in other isolated populations over in over recently, with new risk mutations detected in cohorts from Crete, Iceland, Finland, and other populations. However, to carry out such studies, one must possess the complete DNA sequence for at least a representative sample of the population. In recent years, revolutionary technologies have reduced the cost of DNA sequencing to merely a few thousands of dollars. However, large-scale, international sequencing projects did not include the Ashkenazi population (or in fact, any Middle-Eastern population), and therefore, potential genetic studies have so far relied on a limited number of DNA markers that are mostly common in Europeans.

The absence of a reference DNA sequence data for the Ashkenazi population is also slowing down the application of personal genomics technologies in clinical settings. This because the absence of reference would not allow the proper interpretation of unusual mutations discovered in a given patient. Finally, a complete DNA sequence data has the potential to address important questions in Jewish history, such as the relative contribution of Europeans to the Ashkenazi gene pool.

Our project

With these goals in mind, The Ashkenazi Genome Consortium (TAGC) has recently been founded by a number of geneticists from the NY area, whose research interests span, for example, Schizophrenia, Crohn’s disease, Parkinson’s disease, and cancer. The consortium has recently completed the sequencing, to high quality, of 128 Ashkenazi genomes, the largest resource to date of genetic variation in the Ashkenazi population. The results of the data analysis are to appear in Nature Communications (2014).

Utility in medical genetics

Our analysis confirmed the potential utility of our Ashkenazi DNA sequence resource in medical genetics. Our results showed that the number of Ashkenazi mutations that are missed from available catalogs is almost 50% higher than in a typical European, non-Jewish genome. These and other results established that our sequencing effort has just made interpreting rare mutations in an Ashkenazi genome much more efficient. Moreover, we have also improved the ability to predict missing mutations in an Ashkenazi genome. This is important, because the cost of whole-genome sequencing is still prohibitive for very large samples, and therefore, geneticists frequently resort to data from a limited number of markers when studying the genomes of affected individuals. Being able to predict full Ashkenazi sequences using a small number of markers and our reference data will significantly improve the cost-effectiveness and success rate of medical studies as well as our understanding of disease etiology in the Ashkenazi and other populations.

Genetic similarity between Ashkenazi Jews: a recent founder event

Our data also revealed, in agreement with previous studies, close similarity between unrelated Ashkenazi individuals. Contrary to popular belief, this similarity does persist throughout the entire DNA sequence, as expected in populations that practice consanguinity. In fact, on a genome-wide level, the Ashkenazi genome is more heterogeneous than a typical European genome. However, about 5 or 10 times per genome, a pair of Ashkenazi individuals have an almost completely identical genetic material spanning an exceptionally long sequence (say, over three million letters). In populations of European origins, such an amount of genetic similarity would correspond to fifth-generation cousins. In the Ashkenazi population, these stretches of identical sequence are due to an extremely small population size in the recent history, which is known in genetics as a founder event. Thus, two Ashkenazi individuals with no known common ancestors in the past few generations may nevertheless descend from the same founder. Mathematical modeling has shown that the founder population was indeed small, at only ≈300-400 individuals (this is a so-called “effective population size”, which may be smaller than the actual number of founders), and lived ≈25-30 generations ago (≈700 years ago).

Mutation frequencies: Ashkenazi Jews have ≈50% of European ancestry

Analysis of mutation frequencies in the Ashkenazi and European genomes confirmed that the Ashkenazi population is a mix of a (likely) Middle-Eastern and a European population. Interestingly, the mixture was found to be even, with ≈50% of the Ashkenazi ancestry being traced to each source. This is an important result, since the amount of European ancestry in Ashkenazi Jews has been controversial, with previous estimates ranging between 20-80% and with wide-ranging political and cultural implications.

Hypotheses on Ashkenazi origins

Our results raise a number of questions and hypotheses. Who were he Ashkenazi founders? Who were the European populations that mixed with them and when did it happen? Was it a single event or a millennium-spanning process? A naïve reading of the results would suggest that the founders were early Ashkenazi settlers in Eastern-Europe, but the resolution of our analysis may not be sufficient, at present, for a confident interpretation. The equal amounts of European and Middle-Eastern ancestry suggests, interestingly, a sex-biased process, where, say, Middle-Eastern Jewish men married European non-Jewish women. We are currently investigating these and other questions using the DNA sequence data that our project generated.

Future directions

Now that our sequencing dataset is published, different TAGC researchers turn to hunting for genes associated with their specific disease of interest. As a consortium, we are working with additional collaborators, including the New York Genome Center, to sequence approximately 500 more Ashkenazi genomes. The ultimate goal is to make the catalog of mutations so comprehensive that it will cover all the genetic variation that was present at the time of the foundation of the population, and by implication, almost all variation present today. Such a complete catalog will enable to fully harness the unique demographics of the Ashkenazi population for the benefit of science and medicine.