North American DNA study Clustering of 770,000 genomes reveals post-colonial population structure of North America
Posted: Thu Mar 19, 2020 8:32 pm
https://www.nature.com/articles/ncomms14238
Abstract
Despite strides in characterizing human history from genetic polymorphism data, progress in identifying genetic signatures of recent demography has been limited. Here we identify very recent fine-scale population structure in North America from a network of over 500 million genetic (identity-by-descent, IBD) connections among 770,000 genotyped individuals of US origin. We detect densely connected clusters within the network and annotate these clusters using a database of over 20 million genealogical records. Recent population patterns captured by IBD clustering include immigrants such as Scandinavians and French Canadians; groups with continental admixture such as Puerto Ricans; settlers such as the Amish and Appalachians who experienced geographic or cultural isolation; and broad historical trends, including reduced north-south gene flow. Our results yield a detailed historical portrait of North America after European settlement and support substantial genetic heterogeneity in the United States beyond that uncovered by previous studies.
Introduction
Following the arrival of Columbus and his contemporaries, population expansion in the Americas has proceeded at an exceptionally rapid pace, with factors such as war, slavery, disease and climate shaping human demography. Recent genetic studies of the United States and North America have drawn insights into ancient human migrations1,2 and population diversity in relation to global population structure3,4,5,6,7,8,9,10,11. These insights have been primarily drawn from modelling variation in allele frequencies (for example, refs 11, 12, 13, 14, 15), which typically diverge slowly. This may in part explain why these studies have revealed little about population structure on the time-scale of post-European colonization (1500–2000 AD) that is not directly tied to pre-Columbian diversity within the Americas nor to ‘Old World’ populations outside the United States.
In this study, we analyse genome-wide genotype data from over 777, 000 primarily US-born individuals. Among all pairs of individuals, we identify genetic connections defined by sharing a recent common ancestor; when these connections are aggregated into a network, our computational methods reveal densely connected clusters, in which the members of each cluster are subtly more related to each other. Using a unique collection of 20 million user-generated genealogical records, we annotate these densely connected clusters to identify the putative historical origins of such population substructure, and to infer temporal and geographic patterns of migration and settlement. With much greater granularity than previously possible, our analyses demonstrate the impact of subtle, complex demographic forces in shaping the patterns of genetic variation among contemporary North Americans.
Abstract
Despite strides in characterizing human history from genetic polymorphism data, progress in identifying genetic signatures of recent demography has been limited. Here we identify very recent fine-scale population structure in North America from a network of over 500 million genetic (identity-by-descent, IBD) connections among 770,000 genotyped individuals of US origin. We detect densely connected clusters within the network and annotate these clusters using a database of over 20 million genealogical records. Recent population patterns captured by IBD clustering include immigrants such as Scandinavians and French Canadians; groups with continental admixture such as Puerto Ricans; settlers such as the Amish and Appalachians who experienced geographic or cultural isolation; and broad historical trends, including reduced north-south gene flow. Our results yield a detailed historical portrait of North America after European settlement and support substantial genetic heterogeneity in the United States beyond that uncovered by previous studies.
Introduction
Following the arrival of Columbus and his contemporaries, population expansion in the Americas has proceeded at an exceptionally rapid pace, with factors such as war, slavery, disease and climate shaping human demography. Recent genetic studies of the United States and North America have drawn insights into ancient human migrations1,2 and population diversity in relation to global population structure3,4,5,6,7,8,9,10,11. These insights have been primarily drawn from modelling variation in allele frequencies (for example, refs 11, 12, 13, 14, 15), which typically diverge slowly. This may in part explain why these studies have revealed little about population structure on the time-scale of post-European colonization (1500–2000 AD) that is not directly tied to pre-Columbian diversity within the Americas nor to ‘Old World’ populations outside the United States.
In this study, we analyse genome-wide genotype data from over 777, 000 primarily US-born individuals. Among all pairs of individuals, we identify genetic connections defined by sharing a recent common ancestor; when these connections are aggregated into a network, our computational methods reveal densely connected clusters, in which the members of each cluster are subtly more related to each other. Using a unique collection of 20 million user-generated genealogical records, we annotate these densely connected clusters to identify the putative historical origins of such population substructure, and to infer temporal and geographic patterns of migration and settlement. With much greater granularity than previously possible, our analyses demonstrate the impact of subtle, complex demographic forces in shaping the patterns of genetic variation among contemporary North Americans.