Our paper on the DNA of medieval Ashkenazi Jews is now published!
Congrats to all co-authors and thanks reviewers/editors!
We analyzed 33 genomes from a 14th-century Jewish cemetery in Erfurt, Germany.
See thread for key additions since the preprint.
@sarahzhang
@TheAtlantic
Excellent article. Note a 2019 paper that detected "extreme inbreeding" in the UK biobank, finding that parents are first degree relatives in ~50/450k people (1/9000). It was possible to distinguish father-daughter from full sibling parents.
@LoicYengo
This is a brilliant preprint. It describes an experimental method for separately sequencing the paternal and maternal genomes. This goes beyond the standard "experimental phasing" - the method can also tell which chr is maternal and which is paternal.
Glad to share a new preprint!
We report genome-wide data for 33 Ashkenazi Jews from 14th-century Erfurt, Germany, along with insights on the Ashkenazi founder event and medieval population structure.
Let's dive into the story of the Erfurt Jews.
I have now posted all teaching materials I have used in the past ~four years (~2900 slides).
Included:
Basic statistics parts I and II (Hebrew)
Advanced topics in statistics (Eng)
Genomics, genetic genealogy, and population genetics (Eng)
This is a very interesting new paper, asking: do IVF babies carry more de-novo mutations?
The authors sequenced 1137 whole genomes of IVF and non-IVF trios.
Short answer: IVF babies have ~4.5 more de-novo mutations, mostly of paternal source.
1/6
This is a new perspective paper by
@mathiesoniain
, suggesting that the low transferability of polygenic scores across populations is an inevitable consequence of the "omnigenic" model for the genetic architecture of traits and diseases.
1/
Unbelievable flood of superb population genetics methods over the past few days.
ARG-based f4-statistics and ancestry estimation:
Extremely fast ARG inference:
PBWT-based fast chromosome painting:
Just posted our manuscript "Screening human embryos for polygenic traits has limited utility"
We provide an empirical foundation to the ethical debate regarding the generation of “designer babies” by screening IVF embryos. 1/11
@ehudkar
@ToddLencz
I want to highlight two beautiful papers published/posted last week, which creatively used genetic data to estimate historical human generation times (generation interval, or parental age at conception).
1/10
Our study of British Pakistanis now published in
@NatureComms
. With >2000 genomes, we found substructure driven by social stratification, low recent effective population sizes, widespread consanguinity (more than self-reported), and recessive disease risk.
Project idea: gnomAD for ancient DNA.
Select a variant, a population (or a group of populations), and get the allele frequency trajectory over time.
Anyone interested?
Very interesting, a new platform for whole-genome sequencing. The claim:
"longer read length (~300bp) and fast runs times (<20hrs) with high base accuracy (Q30 > 85%), at a low cost of $1/Gb".
So will this be $50 per whole human genome?
"Stability of polygenic scores across discovery genome-wide association studies"
Sobering results regarding the reproducibility of the PRS an individual will receive when the PRS is based on different discovery GWASs.
1/6
OK, here is a G drive folder with all papers (I'm aware of) on x-population polygenic scores. This covers both population differences in PRS distribution as well as differences in prediction accuracy + methods. I assume the empirical papers are a subset.
Our paper on screening IVF embryos for complex disease risk is now published in
@eLife
!
We study the risk reduction when parents select an embryo for transfer based on its polygenic risk score for a single disease.
Co-led by
@ToddLencz
.
1/12
Fantastic investigative work by
@mathiesoniain
et al, providing strong evidence against claims
() that specific immune gene variants were under very strong natural selection during the black death.
I was quiet recently and several people have checked in. Fortunately, my family and I are safe. But Israel is going through an extremely difficult time as the magnitude of the disaster is unfolding.
The photos and videos are everywhere so I'll be more personal.
->
New method for identifying positively selected alleles and estimating selection coefficients based on peaks in IBD (haplotype sharing) along the genome. By Sharon Browning et al.
Applied to 38,387 TOPMed individuals.
This work is brilliant. In targeted (panel) sequencing data of ~26k tumors, the authors used the off-target reads (coverage ~0.15x) to accurately impute: genome-wide variants, HLA alleles, polygenic scores, and ancestry.
By Sasha Gusev et al.
Limitations of principal components in quantitative genetic association models for human studies
"We find that linear mixed models without PCs performs best in all cases"
Interesting. In the abstract, they explain the observation as:
No words can describe how evil Netanyahu and his government are. The pain they have inflicted on millions of Israelis and Palestinians in less than 1.5 years is unimaginable. And now they started a war with Iran. Pray for us.
A DNA methylation atlas of normal human cell types
Very proud of my colleagues at the Hebrew University faculty of medicine for an amazing publication in Nature. This is supposed to be a major leap forward for liquid biopsy.
There was some criticism of the scientific value of identifying Ashkenazi individuals in the All of Us data.
While I explained it briefly in the original post, let's expand.
🧵
There's one point that I think was overlooked in the debate over the UMAP plot:
Where is the Ashkenazi Jewish cluster?
Given 125k people of "white" ancestry, there should have been at least 2000 Ashkenazi Jews.
Do they form the circled blob?
ICYMI, this is a mind-blowing paper.
The authors show that human DNA could be easily sequenced in large quantities and variants could be detected from environmental sampling of water, air, and sand.
A new preprint with a method for transferring polygenic scores across populations. Yet another method, but looks quite interesting.
It uses summary stats from a large EUR GWAS and a small GWAS in the target population. It works roughly as follows.
1/5
Release of pre-computed polygenic scores for UK biobank participants.
Scores for 36 diseases/traits are available for everyone. Scores for additional 17 traits are available for ~100k individuals.
More details ->
This paper has amazing new ancient DNA science: over 300 new genomes, fine-clustering using IBD, local ancestry inference, new selection scans for monogenic/polygenic traits.
But a 700-page paper (or a book?) is a huge disservice to the community.
New paper in
@CellCellPress
!
We provide an empirical and theoretical evaluation of the expected outcomes of selecting human IVF embryos for traits such as height or IQ. 1/10
With
@ehudkar
, Or Zuk,
@ToddLencz
, and others.
This preprint discovered repeat variants (VNTRs) with very large phenotypic effects. Others have reviewed these interesting findings.
But it's also worth looking at the Supplement, as the imputation algorithm is brilliant.
1/14
A Bayesian method to estimate absolute risk based on a polygenic risk score.
Requires only GWAS summary stats, allele frequencies/LD from a reference panel, and an estimate of the disease prevalence.
In this preprint, the authors argue that for many diseases, the accuracy of GWAS-based PRSs is already very close to the theoretical maximum.
This is important regarding the future of PRS - suggesting that increasing GWAS sample sizes may no longer help.
A few new PRS preprints came out over the weekend.
1) "The Construction of Multi-ethnic Polygenic Risk Score using Transfer Learning"
Nice method for transferring PRSs into non-European populations.
Please distribute:
I have an open position for a PhD student or (preferably) a post-doc in my group in the field of population genetics.
The project will focus on very exciting upcoming data of 3000 deeply sequenced whole-genomes from the diverse populations of Israel.
Interesting to see a serious of preprints from China, using non-invasive prenatal sequencing data from 20-35k pregnancies to run GWASs of various pregnancy-related traits.
The first manuscript studies 104 phenotypes:
This is an eye-opening preprint on estimating direct genetic effects using siblings.
@jasonmfletcher
@Q_StatGen
et al.
The indirect effect (genetic nurture) of a genotype is the environmental influence of the parent who carries the same genotype.
1/7
Recently, the
@eshgsociety
has issued a statement titled "The use of polygenic risk scores in pre-implantation genetic testing: an unproven, unethical practice". See attached for a few thoughts.
The ESHG statement:
An interesting new preprint on how to combine PRS and family history to improve disease risk prediction. By Alkes Price and colleagues.
Bottom line: adding disease status of family members as indicators to logistic regression works well.
Thread ->
1/14
What can we learn from sequencing (100% genetically identical?) monozygotic twin pairs?
Turns out, a lot, particularly on early embryo development. Here, deCODE deeply sequenced ~400 twin pairs, along with their children/parents when available.
1/10
A genetic study of a deeply-phenotyped cohort of ~8700 Israelis.
Genetic data by 0.6x low-pass sequencing (Gencove).
Almost 5000 traits tested, most notably sleep measures and traits inferred from continuous glucose monitoring.
An interesting preprint from the Estonian biobank.
Across 180k genotyped Estonians, the authors looked at how genetic structure and polygenic scores for education were shaped by migration to Estonia's big cities.
🧵
A new (very short) preprint.
Suppose an individual has taken a polygenic risk score (PRS) test, placing him/her at the top 1% of (PRS) risk. What is the probability of a sibling of that individual to also have PRS at the top 1%? Or to be affected?
1/8
"The ABO blood group locus and a chromosome 3 gene cluster associate with SARS-CoV-2 respiratory failure in an Italian-Spanish genome-wide association analysis"
Finally a covid-19 GWAS with some power.
אשמח אם תוכלו להפיץ - אני מחפש דוקטורנט/ית או פוסט-דוקטורנט/ית לקבוצת המחקר שלי באוניברסיטה העברית (קמפוס עין כרם) לעבודה בתחום הגנטיקה של האוכלוסיות. המחקר יתמקד בדאטה גנומי מדהים שאמור להגיע אלינו בשנה הקרובה - ריצוף עמוק של כ- 3000 גנומים מלאים מ- 60~ אוכלוסיות בישראל.
In case you missed it, Orchid Health, a company for genetic testing of embryos, posted this preprint recently.
They compared whole-genome sequencing of an embryo biopsy to DNA from cord blood of the born baby.
New preprint by the Brownings:
A method for inferring male- and female-specific population size histories.
It is based on comparing IBD sharing statistics between X and the autosomes.
Thread->
Very nice work by
@HaraldRingbauer
@jnovembre
@matt_stoneback
: A survey of runs of homozygosity (ROH) throughout history (1785 samples), using their new method for detecting ROH in ancient DNA.
1/3
Amazing work by
@rajivmccoy
@saracarioscia
et al, inferring transmitted parental haplotypes in 40k sperm cells using just 0.01x coverage! Can't wait to dive into the methods!
They found no evidence for deviations from 50/50 probability of transmission.
A really interesting discovery: both mt and Y chr of Neanderthals were replaced by those of modern humans around 200-300kya.
Very clear commentary by
@mikkelschierup
summarizing these news.
A survey of IVF patients (and their partners) in Boston finds that ~80% accept screening embryos for polygenic disease risk.
(n=469 in 2018, n=172 in 2021; ~75% white)
Similar acceptance rates also for genome editing.
A preprint by Genomic Prediction. Using 22k sibling pairs in the UK biobank, they estimated the risk reduction when selecting the sib with the higher "combined polygenic index" for 20 diseases. The results are promising for embryo screening. ->
I was awarded today the Hebrew University presidential award for young researchers, in memory of Prof. Yoram Ben-Porath. What an honor!
In the photo with the university rector (Prof. Barak Medina, right) and president (Prof. Asher Cohen).
@HebrewU
@Hujimed
Short piece by
@Graham_Coop
and
@DocEdge85
on Donnelly 1983: "The probability that related individuals share some section of genome identical by descent". They explain why that paper is so fundamental and a must read for genetic genealogy.
An impressive theoretical/computational advance in demographic inference using "SMC" models.
Thread below👇
SMC describes how the TMRCA (coalescence time) of two or more chrs varies along the genome. The key assumption is that at crossovers,
1/11
The attack on Lazaridis et al in this Perspective is inappropriate:
"the emphasis on patrilineal descent and the absence of discussions ... of matrilines (or of XX [sic]) creates a strong sense that the events of history are carried forward by “great men”"
Very nice preprint. The authors predict "personalized" accuracy (unexplained variance) of polygenic scores, using regression on age, sex, ancestry, and socioeconomic factors. The predicted variance is used to construct calibrated confidence intervals.
Demonstrating against Netanyahu's plan to dismantle Israel's democracy and high courts. If he and his accomplices succeed, Israel becomes a corrupted dictatorship. An existential risk to my country, the greatest in my lifetime.
An interesting short paper on using family history for risk prediction along with polygenic scores. By
@cristenw
et al.
As in previous work, they show that FH is predictive and ~independent of PRS (Fig). But there are also other interesting insights.
1/6
A cool new preprint by
@amythewilliams
and
@SiddharthAvadh1
.
They developed a method to infer the ancestry of the parents of a target individual (without the parents!), plus the admixture time in the history of each parent.
How does that miracle work?
Glad to announce a new paper in
@CellCellPress
: the genetic history of the Bronze Age Southern Levant.
Thread:
The Canaanites were the inhabitants of the Southern Levant (roughly today’s Israel, PA, Jordan, and Lebanon) between 2000-1000 BCE. 1/15
On a recent gloomy weekend I read the classic perspective of
@mendel_random
on the gloomy prospect of personalized medicine.
A very important read, in particular in the era of polygenic risk scores.
The key messages (fig below) summarize it well.
Now comes the great trick. The nanopore sequencing also provides information on methylation. The authors used ~200 sites that are known to be (almost) always methylated only in the father or only in the mother (imprinting).
One gem hidden in the new height GWAS is this set of formulas. Suppose we have a polygenic score for a trait + the average parental phenotype. The formulas provide the optimal weights of these two sources of information for predicting the child's phenotype.
Indeed a landmark paper.
A couple thoughts on height prediction.
1) Wondering why their PRS only used genome-wide significant SNPs and didn't even account for LD. I assume that using e.g. LDpred2 could have increased the var explained substantially beyond ~40%.
The flood continues...
Inferring the geographical locations of ancestors using ARG (w/ beautiful visualizations)
Fast inference of demographic history using the allele frequency spectrum:
->
Unbelievable flood of superb population genetics methods over the past few days.
ARG-based f4-statistics and ancestry estimation:
Extremely fast ARG inference:
PBWT-based fast chromosome painting:
The wildest day in the history of Israel since I remember myself. Hundreds of thousands in the streets as the country came to a halt. Proud to be near the parliament to protect the country from Netanyahu's corrupted dictatorship.
Very nice idea on improving predictive power of polygenic scores for admixed populations (Luca Pagani et al):
Run local ancestry inference, compute ancestry-specific score for each subset (based on ancestry-specific GWAS), then combine the scores.
That's an excellent summary of the current situation in Israel with which I fully identify, from a colleague at the Hebrew University. Very well written.
I have avoided using this platform for politics / current affairs and have tried to restrict myself to science, but with current events, this is no longer possible. Here are some thoughts about what is happening. It is long and rambling. Apologies. 1/24
Phasing/imputation of low-coverage sequencing data: new paper seems like a major breakthrough. Fast enough to use reference panels of tens of thousands, leading to very accurate results even at very low freq variants and with coverage just 0.5-1x.
1/2
A suggestion how to improve the power of GWAS with linear mixed models: When testing a SNP, include as a fixed effect the polygenic score previously constructed from all other chrs. Suggested to better account for background genetic variation.
On the other hand, the Hamas troops are pure, Nazi-like (or ISIS-like) devils. When they and their supporters are gone, the world will be a better place.
I wish we knew what we should do.
An interesting overview of the Taiwan Biobank and its design/phenotypes. Genetic data for 110k people (strangely, genotyped on two arrays with little overlap, and not using their own WGS for imputation). >9k pairs of first degree relatives.
Today I'm a PI at the Hebrew University for exactly 5 years.
@hujimed
This is an exciting milestone. It has been a long journey with ups and downs. Many things I'm proud of (e.g., ) but many lessons learned the hard way.
There's one point that I think was overlooked in the debate over the UMAP plot:
Where is the Ashkenazi Jewish cluster?
Given 125k people of "white" ancestry, there should have been at least 2000 Ashkenazi Jews.
Do they form the circled blob?
How to get people harmed based on their DNA sequence?
I wrote it as a joke tweet a few months ago. This week, a group wrote about it in Nature Reviews Genetics.
Unfortunately, the article is misleading and fear mongering. Let's break it down.
Fun question:
If you know somebody's genome, could you come up with an easy (and covert) way to get this person killed or injured or sick?
Wondering if there's any real risk here that should be part of genomic privacy discussions.
Remember when the Broad Institute discovered polygenic scores? Now it seems as if they invented quantitative genetics.
See below for a thread. (Happy if someone could send me the full text.)
1/7
This preprint by
@segal_eran
and his group provides initial details on an impressive cohort they have set up, consisting of 10k people with very deep phenotyping. This will be de facto the Israel biobank.
"Fully exploiting SNP arrays: a systematic review on the tools to extract underlying genomic structure"
Covers ancestry, IBD sharing, heritability, polygenic scores, LD analysis, and structural variants (including mosaicism). Looks useful for teaching.
Following up, I think the discussion on embryo screening should move on to become more nuanced. I see the same arguments over and over, ignoring a growing body of literature.
Let me try to summarize where the field is standing in my subjective view.
A new opinion piece in Human Reproduction arguing against screening embryos with polygenic risk scores.
The article mentions a few important problems, although I think not much is new. And I disagree with some of their other arguments.
Not sure how I missed it, but NCBI has aggregated 100k genomes from dbGaP, and allele frequencies per population are now included in dbSNP entries.
Called project ALFA. They expect to eventually reach ~1 million subjects.
"Polygenic risk score as a possible tool for identifying familial monogenic causes of complex diseases"
Showing (as expected) that for individuals with family history (UKB), those with low PRS have higher chances of having a rare pathogenic variant.
This is an unusual, sad ancient DNA paper.
Sobibor was an extermination camp operated by the Nazis in 1942/3, where ~200k Jews were murdered in gas chambers and cremated.
The new research identified 10 skeletons and confirmed their Ashkenazi ancestry.
1/
New paper by a company called myome on whole-genome sequencing of IVF embryos. They report high concordance (r^2=0.95) of polygenic risk scores between embryo biopsies and born babies. Also detected rare (inherited) pathogenic variants (e.g., in BRCA1).
Study finds evidence of assortative mating on blood type using data from pre-pregnancy checkups of ~1M Chinese (no genetic data).
Several robustness tests suggested this is not population stratification, but I don't know, I'm still skeptical.
A proposal that the FDA should regulate direct-to-consumer delivery of polygenic scores.
In my opinion this is not convincing. What is the evidence that consumers are harmed? Why should this be a priority for the FDA and worthy of their efforts?
"Whole-genome sequencing of Bantu-speakers from Angola and Mozambique reveals complex dispersal patterns and interactions throughout sub-Saharan Africa"
350 genomes, 12x coverage. Thorough pop gen analysis. Improves imputation in Afr-Amr and in Brazil.
The latest phasing paper from the Brownings in AJHG.
They present a method to measure the switch error rate after accounting for the fact that some genotyping errors will generate apparent switch errors.
An interesting ancient DNA manuscript with 9 new genomes from the Bronze Age Northern Levant city Alalakh. The ancestry is homogeous, composed of Anatolian/Southern Levant/Iran-Caucasus. One outlier (Central Asia ancestry). Isotopes show most were local.