the outbreeding project continues apace…

…in the united states! (just as you all suspected.) amongst white folks anyway (that’s who was included in the study below).

from a study published in 2009, Measures of Autozygosity in Decline: Globalization, Urbanization, and Its Implications for Medical Genetics:

This research has definitively shown the existence of a trend for decreasing autozygosity with younger chronological age in the North American population of European ancestry. The ROHs we identified, larger than 1 Mb, are clearly representative of autozygosity due to distant consanguinity in our outbred populations, and not chromosomal abnormalities or common copy number variants. Using our predictive models of decreasing Fld, we show a quantifiable decrease in consanguinity over the twentieth century. Based on data provided in Carothers et al, this decrease in Fld found in our discovery population is on the order of individuals transitioning from a single inbreeding loop 4–5 generations prior, to no inbreeding loops within <6 generations. We postulate that the increased mobility, urbanization and outbreeding in North America in the last century has led to less consanguinity (and thus less homozygosity and homogeneity) in younger individuals.”

the researchers looked at two different sets of genomes — one from the ninds repository @the coriell institute, the other from the baltimore longitudinal study of aging (blsa). the blsa is, obviously, biased towards people on the east coast of the u.s. (in and around baltimore). glancing through the list of submitters to ninds, there’s also something of an east coast bias there, although many samples do come from other areas of the country (see the list of locations at the end of this post).

amongst the findings in this study are that 1) the number of runs of homozygosity (roh) has decreased in white americans over the last one hundred years or so, and 2) the lengths of the roh have shrunk as well. both of these are good indicators of outbreeding.

here are a couple of tables/charts from the paper (click on images for LARGER views):

measures of autozygosity in decline - table 02

measures of autozygosity in decline - percent of genome in roh

what’s interesting to contemplate, i think, is what this might mean wrt selection pressures on americans going forward? especially, what might it mean in light of european-americans encountering other, newer groups within american society that are not outbreeding so much (at least not at the moment) — newly arrived immigrants from many muslim countries, for example — or even, perhaps, latin americans (although i’m not 100% sure about how much they’ve been inbreeding over the past few hundred years or so — stay tuned!). how is that all going to play out? interesting times.

possibly related footnote — here is an abstract from the 2013 ashg conference:

“Reconstructing the Genetic Demography of the United States”

“The United States (U.S) is a complex, multiethnic society shaped by immigration and admixture, but the extent to which these forces influence the overall population genetic structure of the U.S is unknown. We utilized self-reported ancestry data collected from the decennial U.S Census 2010 and allele frequency data from over 2000 SNPs for over 40 of the most common ancestries in the U.S. that were available from the Pan Asian Single Nucleotide Polymorphism (PASNP), Population Reference Sample (POPRES), 1000 Genomes, and Human Genome Diversity Panel (HGDP) databases. We utilized the relative proportions of individuals of each ancestry within each county, state, region and nation and calculate the weighted average allele frequency in these areas. We reconstructed the genetic demography of the U.S by examining the geographic distribution of Wright’s Fst. Shannon’s diversity index, H was calculated to assess the apportionment of genetic diversity at the county, state, regional and national level. This analysis was repeated stratifying by race/ethnicity. We analyzed households with spouses, using the phi-coefficient as a measure of assortative mating for ancestry. This analysis was repeated stratifying by age of the spouses (older or younger than 50). Most of the genetic diversity is between ancestries within county, but this varies by race/ethnicity, and ranges from 95% for Whites to 43% for Hispanics illustrating that the White ancestries are relatively homogeneously scattered throughout the U.S whereas the Hispanic ancestries show significant clustering by geography. Analysis of the mating patterns show strong within ethnicity assortative mating for American Indian/Alaska Natives, Asians, Blacks, Hispanic, Native Hawaiians/Pacific Islanders, and Whites, with φ = 0.30, 0.864, 0.92, 0.863, 0.478 and 0.832 respectively (P<1×10-324 for each) and significantly less correlation in the younger cohort. These results show demographic patterns of social homogamy which are slowly decreasing over time. One major implication is that data collected from different locations around the U.S are susceptible to both within- and between-location population genetic substructure, leading to potential biases in population-based association studies.”
_____

origin cities of the ninds samples (from a quick-ish glance):

Burlington, VT
Lebanon, NH
Boston x 10
New York x 7
Albany
Rochester, NY x 4
New Haven x 3

Bethesda x 7
Baltimore x 5
Philadelphia
Washington, D.C.

Winston-Salem, NC
Charleston, SC

Atlanta
Birmingham x 3
Augusta

Jacksonville x 4
Tampa
Gainesville

Cincinnati x 5
Cleveland
Lexington
Louisville
Memphis
Indianapolis, IN
Ann Arbor

Chicago x 3
Springfield, IL
Rochester, MN
Minneapolis
Englewood, CO
Kansas City

Houston x 4

Phoenix
Salt Lake City

Los Angeles
Irvine, CA
Fountain Valley, CA
San Diego x 2
San Francisco x 3
_____

previously: runs of homozygosity and inbreeding (and outbreeding) and runs of homozygosity in the irish population and western europeans, runs of homozygosity (roh), and outbreeding and russians, eastern europeans, runs of homozygosity (roh), and inbreeding

(note: comments do not require an email. funky penguin!)

Advertisement

inbreeding and cognitive ability among whites in the u.k.

via dienekes via jayman:

Genome-wide estimates of inbreeding in unrelated individuals and their association with cognitive ability

“INTRODUCTION

“Research on consanguineous marriages, and other forms of inbreeding, has long shown a reduction in cognitive abilities in the offspring of such unions. The presumed mechanism is that detrimental recessive mutations are more likely to be identical by descent in the offspring of such unions and so have a greater chance of being expressed. To date, research on the relationship between inbreeding and cognitive ability has largely been restricted to recent inbreeding events as determined by pedigree…. It has been suggested that intellectual disability is under negative selection, and that recent deleterious mutations have an important role in the underlying aetiology. The wealth of molecular genetic data currently available allows estimates of inbreeding on a genome-wide level and to examine the effects of long-term ancestral levels of inbreeding. Such an association with inbreeding, as measured by runs of homozygous polymorphisms (ROH), has previously been identified with several behavioural traits, such as schizophreniz, Parkinson’s disease and personality measures, as well as non-behavioural traits such as height.

“The relationship between inbreeding on a population level and cognitive ability is particularly interesting due to assortative mating, non-random mating, which is greater for cognitive ability than for other behavioural traits, as well as physical traits such as height and weight. Positive assortative mating has been reported for cognitive ability, particularly for verbal traits, with spousal correlations generally around 0.5. Assortative mating should lead to greater genetic similarity between mates at causal loci for cognitive ability and to a lesser extent across the genome, which in turn reduces heterozygosity at these local. In other words, in contrast to the genome-wide reduction of heterozygosity caused by inbreeding, the reduction of heterozygosity due to assortative mating for a trait is limited to loci associated with the trait…. Another difference between inbreeding and assortative mating is that the effects of inbreeding are expected to be negative, lowering cognitive ability, whereas the effects of assortative mating affect the high, as well as the low end of the ability distribution, thus increasing genetic bariance, that is, when high-ability parents mate assortatively, their children are more likely to be homozygous for variants for high ability, just as offspring of low-ability parents are more likely to be homozygous for variants for low ability….

“MATERIALS AND METHODS

“Participants

“The Twins Early Development Study (TEDS) recruited over 11 000 families of twins born within England and Wales between 1994 and 1996…. In this analysis, individuals were excluded if they reported severe current medical problems, as well as children who had suffered severe problems at birth or whose mothers had suffered severe problems during pregnancy. Twins whose zygosity was unknown or uncertain or whose first language was not English were also excluded. Finally, analysis was restricted to twins whose parents reported their ethnicity as ‘white’….

“Cognitive measures

“Verbal and non-verbal tests were administered using web-based testing. The verbal tests consisted the Similarities subtest and the Vocabulary subtests from the Wechsler Intelligence Scale for children (WISC-III-UK). The non-verbal tests were the Picture Completion subtest from the WISC-III-UK and Conceptual Grouping from the McCarthy Scales of Children’s Abilities. A general score was derived from the test battery as the standardized sum of the standardized subtest scores, which correlates 0.99 with a score derived as the first principle component of the test battery score.

“Runs of homozygosity

“FROH was defined as the percentage of an individual’s genome consisted of runs of homozygosity (ROH)…. [O]nly ROH with a minimum of 65 consecutive SNPs covering 2.3Mb were used when calculating the total proportion of the genome covered by ROH. In addition, the required minimum density in a ROH was set at 200kb per SNP, and the maximum gap between two consecutive homozygous SNPs was set at 500kb….

“RESULTS

“Table 1 includes descriptive statistics for FROH and the three measures of cognitive ability (general, verbal, and non-verbal). FROH is slightly positively skewed, as it represents the total percentage of the genome that includes runs of homozygosity (ROH). The average percentage of genome covered by ROH was 0.7% (95% CI 0.65-0.72%). Verbal and non-verbal abilities correlate 0.49; because general cognitive ability is the sum of the standardized verbal and non-verbal subtests, they correlate much more highly with general ability (0.87 and 0.86, respectively).

inbreeding and iq - table 01

“Table 2 presents the results of the linear regression analyses. No significant regression was found between FROH and the cognitive measures after correction for multiple testing, although the association with non-verbal cognitive ability was nominally significant (P=0.03). Although this association was not statistically significant, it is noteworthy that every regression in Table 2 is *positive*, indicating that increased homozygosity tends to be associated with *higher* cognitive scores across different measures of cognitive ability (general, verbal and non-verbal).

inbreeding and iq - table 02

“Our analysis identified 87 loci where ROH overlapped in 10 or more individuals. For these overlapping regions we tested for association with each of the cognitive measures and again showed no significant associations after correction for multiple testing (P-values of less than 5.7 x 10-4). A sign test of the direction of effect across all ROH showed a disproportionately large number of *positive* associations, indicating that ROH are associated with higher cognitive ability (P=0.002). The sign test was non-significant for verbal ability but highly significant for non-verbal ability (P<10-6). The sign test for non-verbal ability alone remained significant after correcting for an individual’s genome-wide FROH score (P<10-6).

“As explained earlier, positive assortative mating can also lead to genome-wide homozygosity for trait-specific loci, and, unlike inbreeding, assortative mating can affect the high as well as the low end of the ability distribution. One possible explanation for the trend suggesting a positive correlation between homozygosity and cognitive scores in our data is that positive assortative mating on intelligence might be greater for high cognitive ability individuals….

“DISCUSSION

“Our results show that within a representative UK population sample there was a weak nominally significant association between burden of autosomal runs of homozygosity and higher non-verbal cognitive ability. This nominal association with *increased* cognitive ability is counterintuitive when compared with the results from more extreme inbreeding based on pedigree information. A potential explanation for this direction of effect is that individuals with higher cognitive ability might show greater positive assortative mating, which would lead to increased homozygosity at loci for higher cognitive ability in their offspring. However, in a separate sample we showed that greater positive assortative mating was not associated with higher cognitive ability. While these findings seem to provide clear evidence against this hypothesis, it is possible that the genome-wide genetic finding reflect historical mating habits that no longer exist today. It should also be noted that there was a reduction in the standard deviations for spousal correlations in the increased cognitive ability groups by an average of 6% compared with the decreased cognitive ability group (see Table 3), which could reflect less genetic variability in the high ability couples or a ceiling effect on the cognitive tests. This lesser phenotypic variability at the high ability end would have a small effect in reducing the spouse correlations and potentially confound our analysis….

“Overall, these results highlight the importance of understanding mating habits, such as inbreeding and assortative mating, when investigating the genetic architecture of complex traits such as cognitive ability. The results certainly suggest that there is no large effect of FROH on reduced cognitive ability, the expected direction of effect. The nominally significant associations found in this study may even suggest that in the case of non-verbal cognitive ability, beneficial associations with homozygosity at specific loci might outweigh the negative effects of genome-wide inbreeding and that the relationship between inbreeding and cognitive ability may be more complicated than previously thought.
_____

so, although obviously Further Research is RequiredTM, these researchers have concluded that both the absence of reduced cognitive ability and the slight increase in cognitive ability which they found in individuals who had runs of homozygosity (roh) in their genomes (evidence of matings between genetically similar individuals) were probably NOT due to assortative mating (i.e. smart people mating with smart people).

furthermore, they suggest that the inbreeding-causes-reduced-cognitive-ability meme is incorrect — or at least that the situation is more complicated than the idea that it’s the accumulation of recent deleterious mutations which haven’t been selected away that is the (whole) problem. in fact, a little inbreeding seems to have a positive effect on some cognitive abilities!

i’ve suggested a couple of times one way in which inbreeding might result in a low average iq in a population, and that is if the inbreeding leads to clannish, altruistic behaviors between extended family members which then result in the deleterious mutations NOT being weeded out.

one real world example i’ve offered is how life works in egyptian villages and how the more successful and affluent (and, presumably, more intelligent) members of a clan are obliged to help out their less successful and poorer (and, presumably, less intelligent) clan members. so, apart from mentally retarded individuals not reproducing, where is the negative selection for deleterious mutations here? there is none. or it’s a lot weaker than in more individualistic societies (like gregory clarks’ medieval england) where it’s more every man for himself — in clannish societies, deleterious mutations might be able to hang around for a long time, riding on the coattails of those with fewer deleterious mutations.

(note: comments do not require an email. i’m my own grandpa! [no, I’M not! it’s just the song.])

runs of homozygosity in the irish population

so, after all my rambling about the historic mating patterns amongst the native irish, how inbred are the irish really?

from Population structure and genome-wide patterns of variation in Ireland and Britain:

[O]ur results suggest that the Irish population has the largest proportion of the genome in ROH (as measured by FROH1), relative to the British and HapMap CEU populations examined here (Figure 3).”

the members of the ceu population are mormons in utah. here is figure 3 — click on images for LARGER view:

ireland - roh01 - o'dushlaine et al

Figure 3 – FROH1 patterning in Irish, British and Swedish populations. Box plots represent (a) the number and (b) the summed size of segments of the autosomal genome that exists in ROH of 1 Mb or greater in length (ie, FROH1). The bars represent mean and confidence intervals, as per a standard box plot (box indicating the 25th–75th percentile of the FROH1 distribution, line within box representing the median and ends of the whiskers representing the 5th–95th percentiles). Outliers are represented by diamonds.”

so the irish: more AND longer roh or runs of homozygosity (1 Mb in length or greater) than the english, the utah mormons, scots in aberdeen, or the swedes — in that order (if i’m not mistaken). so the english here are the most outbred (what have i been saying?), while the irish are the most inbred.

more from the paper:

“Overall, the Irish and Swedish populations seem slightly different from the others in the context of ROH. Both the Irish and Swedish populations showed, on an average, a greater number of ROH, an increased maximum ROH length, as well as an increased proportion of the genome in homozygous runs, compared with that of the Scottish, southern English and Utah populations. Similarly, the mean level of individual autozygosity per population as measured by FROH22 was highest for the Irish group (Figure 4). Together, these results suggest slightly increased autozygosity in the Irish cohort compared with the British and Swedish cohorts.”

here’s figure 4:

ireland - roh02 - o'dushlaine et al

Figure 4 – Mean FROH1 and FROH5 patterning in Irish, British and Swedish populations. See Figure 1 legend for population identifiers. Y-axis indicates the average proportion of the autosomal genome covered by FROH1 or FROH5 (see Materials and Methods for definition of FROH).

“Autozygosity is generated by increased levels of kinship, which in turn reflects the population history of Ireland. Although relatively undisturbed by secondary migrations, the population of Ireland has undergone expansions and contractions at numerous points in recent history (eg, two major famines since 1600, disease epidemics, expansion in the first half of the 19th century). Aside from these features, the increased autozygosity may also reflect legacies of Gaelic family structures and comparatively low levels of migration that are in part due to a lack of industrial revolution in Ireland.

“To test a hypothesis of increased autozygosity due to features of relatively recent population history, we examined the patterning of homozygosity looking for signals of parental relatedness over the last four or five generations. Previous work has illustrated that parental relatedness arising within four to six generations predominantly affects ROH over 5 Mb in length.22 We therefore compared this statistic across populations. Results show that the Irish and Swedish populations have around 10 times as much of their genomes in ROH over 5 Mb in length than the southern English, and 1.5–3 times as much as Scotland and Utah (Figure 4)….

“Analysis of ROH is a powerful method to gauge the extent of ancient kinship and recent parental relationship within a population. This is because ROH arise from shared parental ancestry in an individual’s pedigree. The offspring of cousins have very long ROH, commonly over 10 Mb, whereas at the other end of the spectrum, almost all Europeans have ROH of ∼2 Mb in length, reflecting shared ancestry from hundreds to thousands of years ago. By focussing on ROH of different lengths, it is therefore possible to infer aspects of demographic history at different time depths in the past.22 We used FROH measures to compare and contrast patterning across populations. These measures are genomic equivalents of the pedigree inbreeding coefficient, but do not suffer from problems of pedigree reconstruction. By varying the lengths of ROH that are counted, they may be tuned to assess parental kinship at different points in the past. We used two different measures, FROH1, which includes all ROH over 1 Mb and hence includes information on recent and background parental relatedness, and FROH5, which sums ROH over 5 Mb in length, more typical of a parental relationship in the last four to six generations.22 Our FROH1 results indicate slightly elevated levels in the Irish and Swedish populations (compared with southern England, Scotland and HapMap CEU) of both the overall number of ROH and the proportion of genome in ROH (see Figure 3). This pattern was exaggerated when we restricted analysis to ROH greater than 5 Mb in length (ie, FROH5, see Figure 4), indicating increased levels of parental relatedness in the last six generations in the Irish and Swedish populations compared with other populations tested in this study. When we remove individuals with ROH over 5 Mb from the FROH1 analysis (Supplementary Figure S5), Ireland remains as the population with the most homozygous runs and the longest sum length of homozygosity. This provides further evidence that the elevated proportion of shorter ROH, and hence the number of ancient pedigree loops in Ireland, is indeed real and not driven by a limited number of offspring of cousins.

recent cousin matings, they mean.

so, if you look at figure 4, both the irish and the swedes have way more roh of over 5 Mb in lenth than the english (who have a really miniscule amount), the scots in aberdeen, or the mormons in utah (ceu) — in that order. in this instance, the swedes appear to have the most roh over 5 Mb, but as the authors say, when they removed the over 5 Mb individuals from the samples (i.e. the individuals most likely to be the offspring of recent cousin marriages), the irish wind up having the most and the longest roh over 1 Mb in length, so they win the overall inbreeding prize for these groups.

what the authors overlook, i think, is the longer term mating patterns of these populations. i think that the english in this study (and, it should be noted, that these are described as individuals from the south and southeast of england) have miniscule amounts of roh in their genomes because, out of all these groups, they have been outbreeding the longest (see “mating patterns in europe series” ↓ below in left-hand column) — since the early part of the middle ages, in fact. the irish and the swedes, on the other hand, have more roh because they started outbreeding much later (and, probably, too, because, like other northern populations, they’re somewhat remote and small in size) — the swedes sometime after they converted to christianity in — when was it? — ca. 1000 a.d.? and the irish, as i’ve shown in the last few posts on irish mating patterns, not until sometime towards the late medieval period — as late as the 1500s possibly.

the implication of all this is, because the irish and the swedes (and other groups in europe) inbred for longer than the english (and some of the french and dutch and germans), their societies would’ve remained clan- or extended-family based for longer than those of the english et al., and so would’ve been under different sorts of selection pressures from their social environment.
_____

update: Supplementary Figure S5 – when the researchers removed the individuals with roh over 5Mb, i.e. those individuals who were most likely to be the offspring of cousins (see comments):

ireland - roh03 - o'dushlaine et al

previously: runs of homozygosity and inbreeding (and outbreeding) and western europeans, runs of homozygosity (roh), and outbreeding and russians, eastern europeans, runs of homozygosity (roh), and inbreeding and early and late medieval irish mating practices and clannish medieval ireland and inbreeding in europe’s periphery and early modern and modern clannish ireland and meanwhile, in ireland… and drinkin’ and fightin’ songs and mating patterns, family types, and clannishness in twentieth century ireland and inbreeding in ireland in modern times

(note: comments do not require an email. clan map of ireland.)

western europeans, runs of homozygosity (roh), and outbreeding

i know, i know — it’s easier to spot inbreeding (or outbreeding) from the presence (or absence) of a lot of long runs of homozygosity (roh) in the genomes of individuals in a population rather than short roh (see for example the central/south and west asians in this post, populations which everyone knows are regular inbreeders), but i haven’t got any data on long roh for separate, sub-populations (like italians vs. europeans), so we’re gonna have to make do with short roh (for now). and anyway, even the amount of short roh is reduced via outbreeding (and increased via inbreeding), so you can use it as a tool to try to work out a population’s mating history. it’s just not as easy/obvious as with longer roh.

so … the map below is taken from Genomic and geographic distribution of SNP-defined runs of homozygosity in Europeans.

the samples come from:

the rotterdam study – the netherlands
popgen – northern germany – specifically the schleswig-holstein region (in deutsch if you like)
– the monica augsburg surveys – southern germany – from the city of augsberg and two neighboring counties
– and popres, which, since this is a study of europeans, i presume must mean that the samples came from both the lolipop study in london and the colaus study, lausanne, switzerland — i discussed those two studies in this previous post (scroll down).

again, the problem with taking samples from people living in big cities is that, even if they may be natives of whatever country they happen to live in, they, or some of their recent ancestors, may have migrated to the city — so, who knows, for instance, if the samples from rotterdam tell us anything about rotterdam or even the region of the country in which rotterdam is located. probably tells us something about the dutch, but even then….

these researchers — nothnagel et al. — chose to look at roh that were 1Mb in length. that’s shorter than the 1.5Mb roh as delineated by the researchers who looked at the roh in russian populations. also, nothnagel et al. weighted the average roh in each population according to how much linkage disequilibrium was (estimated to be) present in each population. don’t ask! no, really — don’t ask, because i don’t really understand why they did this. here’s the wikipedia page for linkage disequilibrium. i know that you can have more ld in an inbreeding population and — you guessed it! — less in an outbreeding one. and, of course, other things like bottlenecks can affect how much ld is present in a population. nothnagel et al. found different amounts of ld in the populations in this study and compensated for that, but again i’m not exactly sure why.

anyway … here’s what they found. this map shows the subpopulation averages of the weighted number of roh per individual (the contour lines are guesstimates — educated guesstimates, but still guesstimates):

europe roh - average weighted ROH number per individual

if you look closely, you’ll see that there’s a sort-of central band of a relatively low average number of roh (between 37-39) that runs from southern england down through beligum/the netherlands (rotterdam) and northeast france, southern germany and switzerland. and, as the researchers observed, and as we saw in the previous post on russia, the numbers of roh increase going northwards and decrease going south. until you get to southern spain and southern italy, southern greece, and (probably) a central spot in the balkans there, all regions where the average number of roh increases again. the researchers suggest that, perhaps, migration from northern africa to the iberian peninsula (that’s the only region for which they offer a possible explanation for this anomaly) explains the longer roh there — presumably they’re thinking of a bottleneck. maybe. but perhaps it’s due to greater historic inbreeding in southern spain — and southern italy and greece and the balkans. some data showing longer roh would help us tell one way or the other.

the researchers, btw, acknowledge that the areas indicated as having very low amounts of roh — colored in the lightest shades of yellow — i.e. northwest spain and eastern europe — are probably artifacts of the interpolation method that they used. also, for all you scots out there (you know who you are! (^_^) ), while i do predict that the average numbers of roh in scotland ought to be higher there than in england, note that there was no data for scotland included in this study, so the shades of the contours up there are wild guesses as well.

i’m quite surprised by the very low levels of roh in romania, but remember that one has to read this map with the underlying north-south differences in numbers of roh in mind, so perhaps the roh in romania really indicates an inbreeding/outbreeding rate in romania that is more like that found in, say, france/germany. dunno. in any event, it’s very interesting.

now i want to compare the average number of roh in eastern europe with western europe. that’s going to be kinda hard to do since 1) the two studies used different roh lengths (1Mb vs. 1.5Mb), and 2) the numbers from this study have been weighted. still, i think we can get at something of a (very!) rough picture by taking the numbers from germany as our starting point and using them to calibrate the results from the two studies. we can do this, i think, since the samples from germany came from the same sources in both studies — the popgen study for northern germany and the monica study for southern germany.

in the russian study, the samples from northern and southern germany were combined, so we only have one number for germany — which was lower than all the results from eastern europe, typically much lower (see map from previous post). the number of roh in the polish sample, for instance, was more than twice that found for the germans. the average number of roh in russia (Rus_HGDP) was also twice that of the germans. czechs, latvians, estonians — all higher than the germans.

now if we work westwards from germany using the results from the study in this post — the english, the dutch (rotterdam), and the swiss are all in the same range as the southern germans, while the southern french have an even lower average number of roh — and the irish (in dublin) and the czechs are in the same range as the northern germans. so all of these populations — and even the spanish and italians — have fewer roh on average than eastern europeans. which is what i would’ve guessed given what we know about the historic mating patterns of europeans beginning in the early medieval period (see mating patterns in europe series below ↓ in left-hand column).

maybe there’s another explanation for this difference between western and eastern europe — and for the apparent differences between central and southern europe. like i said above, a study or two looking at longer roh would help to clear up the picture one way or the other.

previously: russians, eastern europeans, runs of homozygosity (roh), and inbreeding and ibd and historic mating patterns in europe and ibd rates for europe and the hajnal line and runs of homozygosity and inbreeding (and outbreeding) and runs of homozygosity again

(note: comments do not require an email. ruh roh!)

russians, eastern europeans, runs of homozygosity (roh), and inbreeding

greying wanderer (thanks, grey!) pointed out to me (via) a very interesting study of russian/eastern european genetics which includes some runs of homozygosity (roh) data (which can provide clues of inbreeding/close matings among other things): A Genome-Wide Analysis of Populations from European Russia Reveals a New Pole of Genetic Diversity in Northern Europe. (dienekes has a really good explanation of roh here.)

in this latest study, khrunin et al. took a look at a handful of different ethnic russian sub-populations (from different locations in russia) as well as some other eastern european groups. most of the samples from russia they collected themselves — the rest came from other studies. here’s a list of which groups were included and where they came from:

– russians (n=384) from the archangelsk (mezen district, n = 96), vladimir (murom district, n = 96), kursk (kursk and oktyabrsky districts, n = 96), and tver (andreapol district, n = 96) regions
veps (n=81) from the babaevo district of the vologodsky region
komi (n=150) from the izhemski (izhemski komi, n = 79) and priluzski (priluzski komi, n = 71) districts of the komi republic.

all of these samples were collected by the authors — except for those from tver — and the researchers ensured that the subjects AND their parents were originally from whatever region in which they happened to find them (i like that!).

the data from other studies which they used are described in this paper and include:

– finns – samples from helsinki (n = 100) and kuusamo (n = 84) – kuusamo is really remote
– estonians (n = 100) – samples collected across the entire country
– latvians (n = 95) – samples collected in riga – parents had to be latvians
– poles (n = 48) – from the west-pomeranian region, so just on the border with germany
– czechs (n = 94) – from prague, moravia, and silesia
– germans (n = 100) – from schleswig-holstein in the north and the augsburg region in the south
– italians (n = 88) from tuscanyhapmap
– russians (n = 25) from the human genome diversity panel (hgdp) – i believe from the vologda oblast.

the data collected by khrunin et al. are really good, imho, since 1) they went to all the trouble of collecting samples from different regions of russia, and 2) the researchers tried to control for ethnic/regional origin. the quality of the data from all the other studies is kinda mixed, for my interests anyway. for instance, taking in samples in large, capital cities — meh — not so great. the residents of those cities could’ve come from all over the country. the northern versus southern sampling in germany is better; unfortunately, those data sets were combined together in this study (they’re kept separate in another really cool study which i will post about soon!). the estonian data set is interesting because the samples came from across the country. otoh, the polish data set is also interesting because it’s from such a specific region (and right on the border with germany).

ok. one last thing before i show you the results (i made a map!). different researchers define roh differently (*sigh*) — while there do seem to be some standards, there’s also quite a bit of variation, and different researchers choose to look for roh of varying lengths. in this study, the researchers looked for roh that were 1.5Mb in length (i’ve seen other researchers look for 1Mb in length). 1.5Mb is pretty short as far as roh go. if you recall, when a population has a lot of longer roh (like 4-8Mb or more), that’s a pretty good indicator of inbreeding. 1.5Mb — not so much. lots of short roh are a better indicator of something like a population bottleneck in the distant-ish past. but, what’s a girl to do? gotta work with what’s available, and if it’s short roh, so be it.

here (finally!) is the map. i took the data from this table. the map (first column of data) is of the average number of roh (of 1.5Mb) found in individuals in the different populations (nROH):

russia nroh

the most obvious thing to note is that the small, endogamous groups (the veps and the komi) have more roh than any of the other populations, except for the finns up in kuusamo (and i think that that’s probably due to a bottleneck — ethnic finns really only migrated to, and began to settle in, the area seriously in the 1600s, and i imagine it wasn’t very many of them — and being so far away from anybody else!). the veps and the komi are small populations and, historically, they didn’t marry out much (that’s why we have veps and komi people today), so they are somewhat inbred. definitely more so than the surrounding population.

another curious thing is the pretty high number of rohs in the baltic populations: latvians=0.58, estonians=0.61, and finns in helsinki=1.13. wow! what happened there? that’s something like three to five times the number of roh we see in italians (from tuscany) or germans.

the most interesting point for me, though, is that there is an east-west divide. it’s kinda vague, maybe, but i think it’s there: italians (tuscans) and germans at ca. 0.20, and then the czechs and poles right next door at 0.35 and 0.51 respectively. and everyone to the east, except the russians in kursk, higher again than those two figures. i think these results hint at what i’ve found in the history books on medieval europe, i.e. that western europeans began outbreeding earlier than eastern europeans and as a result wound up being more outbred. (see, for example, here and here — and the “mating patterns in europe series” below ↓ in left-hand column.)

finally, the authors of the study point out how it appears that the average number of roh in individuals in a population increases with latitude — and they mention that this has also been shown elsewhere (i’ll be posting on that paper — very soon!). if you look at the various ethnic russian populations, for instance, the russians down in kursk (Rus_Ku=0.28) and murom (Rus_Mu=0.39) have fewer roh than the russians further to the north in tver (Rus_Tv=0.49) and way up in mezen (Rus_Me=1.63!). however, the hgdp russian samples, apparently from the vologda oblast which is pretty far north, have relatively low numbers of roh (Rus_HGDP=0.44), so that doesn’t seem to fit. still, it does look like a real pattern to me. the authors suggest that this is due to the general pattern of how europe was settled (from the south to the north), as well as the fact that the farther north you go, the fewer people there are to mate with (so the more inbred you wind up being).

as i’ll show in my next post, though, while there does seem to be a north-south pattern to roh frequency in europe with more roh in populations to the north than the south, curiously the numbers seem to increase in southern europe as well (as compared to places in central europe like germany and france) — and strangely in the balkan region as well. i can’t imagine why! (^_^)

previously: ibd and historic mating patterns in europe and ibd rates for europe and the hajnal line and runs of homozygosity and inbreeding (and outbreeding) and runs of homozygosity again

(note: comments do not require an email. kuusamo traffic jam!)

the hgdp samples again

i’ve written before (here, here and here) about the hgdp samples and the fact that there is very little to no provenance info connected to them. the problem with this, afaics, is that it’s difficult to know whether or not the hgdp samples are truly representative, in all ways, of the populations from which they came.

i was particularly concerned initially about the french (and the japanese) hgdp samples — and then i got over that — but now i’m concerned about them again. here’s why:

the hgdp samples from france are described thusly:

“France – French/various regions (relatives) – This sample from various regions of France is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.”

great!

hang on — which regions?

auvergne? where, in some villages in the eighteenth century, groups of families regularly inbred with one another? lorraine? which, in some areas, had consanguinity rates of up to 50% between 1810 and 1910? burgundy or brittany, both of which had reportedly higher cousin marriage rates in the nineteenth and twentieth centuries than other regions of france? or were the hgdp samples collected in places like central france which, historically, had much lower rates (in the range of 1-3.5%) of close marriages?

the thing is: we don’t know.

what we do know is that the hgdp sampling seems kinda biased towards unique little groups like basques and orcadians, sardinians and the adygei. which is understandable ’cause these are all interesting, unusual groups and there’s legitimate concern that their unique genomes might sorta disappear in our modern, outbreeding world, and it would be a shame to miss out on the chance to at least keep a record of all that human biodiversity.

but then i have to wonder how representative of the majority of french people are the french hgdp samples? do they truly represent “the french,” or did the samples come from some of those crazy little villages way up in the mountains? i dunno. and neither does anybody else (afaik).

and the reason i wonder is: if teh scientists are gonna do really awesome genetic studies to check for the relatedness between the members of different human populations — like runs of homozygosity (roh) studies or identity by descent (ibd) studies — i think they need to know if the samples they’re looking at are representative or not. do the results for “the french” in studies like this or this or this truly represent the average french, or do they represent some special sub-groups of mountain dwelling french?

in the most recent roh study i posted about, the “french” don’t appear to be much more in- or out-bred than orcadians or the basques, something which strikes me as odd. perhaps — perhaps — that’s because the french hgdp samples are not truly representative of the broader french population. perhaps. i don’t know. nor do the researchers.

rinse and repeat above discussion for the other samples, too.

previously: hgdp samples and relatedness and more on the hgdp samples and why i care about the hgdp samples and meanwhile, in france… and runs of homozygosity and inbreeding (and outbreeding) and ibd and historic mating patterns in europe and runs of homozygosity again

(note: comments do not require an email. not out on a limb, am i?)

runs of homozygosity again

**update below**

here’s an exciting new paper!: Genomic Patterns of Homozygosity in Worldwide Human Populations. i don’t have access to the paper itself, but there are lots o’ neat figures and tables in the supplemental data [opens pdf] that relate to runs of homozygosity (roh). roh are identical stretches of dna within an individual’s genome (i.e. identical on each of the dna strands, paternally and maternally inherited). (roh shouldn’t be confused with blocks of identity by descent [ibd], which i did once! ibd blocks are identical stretches of dna as compared between different individuals, iiuc.)

recall that possessing lots of long roh indicates that one’s parents are/were quite similiar genetically speaking. that can be as a result of a couple of different genetic scenarios like (as greying wanderer has brought up a lot recently) simply being from a small sized population (i.e. having a small effective population size) and/or from regular inbreeding (consanguineous/endogamous mating). so, a population having a lot of long roh is either small and/or inbreeds a lot. populations having LOTS of short roh have probably been through some sort of bottleneck (see previous post).

in the paper i looked at in that previous post, the researchers had looked at the different roh lengths for large, regional populations like “europeans” or “east asians.” amongst other things, they had found that some of my regular inbreeders — the fbd marriage folks — had some of the highest numbers of medium and long roh, a state of genetic affairs which likely reflects their long-term close mating patterns. interestingly, the researchers had found that east asians had roh lengths similar to those of europeans across the board, something which surprised me since, at least according to what i’ve been reading, east asians (i.e. the chinese) have been inbreeding for a much longer time than europeans. one drawback of that previous study, though, was that, apart from the french, most of the european populations they looked at were peripheral groups who have had a tendency to inbreed more than my “core” europeans (see mating patterns in europe series below ↓ in left-hand column).

the new paper suffers from some of the same problems since the data come from the same sources (hgdp-ceph and hapmap phase 3 populations), so northern europeans — apart from the french — aren’t included in this paper either. (what can you do? it’s early days yet. i look forward to when there’s lots more genetic data available out there for teh scientists to work with! (^_^) )

what the researchers in this paper have done, though, is to look at both the different mean lengths of roh in each of the different populations sampled AND they looked at total numbers of roh within individuals for each population. this has, i think, drawn out some interesting differences between the populations.

first, here are two graphics from the supplmental data (linked to above). click on each for LARGER views (they should open in new tabs/windows — you might have to click on them again there to super-size them).

i’ve highlighted a handful of populations i want to focus on ’cause i know a little something about their historic mating patterns: the bedouin (as a proxy for the arabs — note that the bedouin have probably inbred more than more settled arabs); italians (not sure if they’re northern or southern italians or a mix of both — however, there are tuscans in the samples with which these “italians” can be compared); pathan or pastuns (more fbd marriage folks, like the bedouins/arabs); and han chinese (there are some northern han chinese with whom this groups can be compared). ok. here are the charts:

as you can see, the researchers have split up the roh into three classes (note that the short and medium classes here are a lot shorter than those in the paper looked at previously):

– A: 0.25-0.40 Mb (short)
– B: 0.6-1.2 Mb (medium)
– C: 0-35 Mb (long)

the interesting thing in the first chart above (Fig. S3 – Mean ROH Length for Each of the Three Size Classes in Each Population), is that the han chinese have lower means of roh length in all of the size classes compared to the other populations i’ve highlighted. in the previous study, the researchers found that east asians had similar means to europeans for all roh lengths. i found this surprising since, from what i’ve read, the han chinese have been inbreeding for a longer period of time than europeans. what might be confounding the results though, once again, is the fact that nw europeans (the outbreeders extraordinaire) are not really included in either of these studies apart from a handful of french samples.

in this latest study, both the bedouin and the pashtun, for instance, have higher means — and wider spreads — of long (class C) roh than the italians, which is what i would’ve expected since those two groups (the bedouins and the pashtuns) are, being fbd marriage folks, serious inbreeders. perhaps the reason the han chinese long roh mean is comparatively low is partly due to the fact that they historically practiced mother’s brother’s daughter (mbd) marriage which doesn’t push towards such close inbreeding as fbd marriage. still, i would’ve expected to see greater means of roh for the chinese than the italians — or, at least, around the same. not so much lower. (unless the italians practiced fbd marriage, too — or fzd marriage — but i don’t think so.)

if you look at the second chart (Fig. S4 – Total Number of ROH in Individual Genomes), however, you’ll see that, overall, the han chinese have more short, medium and long roh totally in individual genomes than any of the other three populations i’ve highlighted. both the bedouins and the pashtuns have greater numbers/wider total spread of long roh than the italians, but the han chinese have a much greater total number of long roh than any of the other three groups — three or four times as many.

but they’re, on average, shorter long roh don’t forget. (confusing, eh?!)

perhaps this is what you get when you have — as the chinese have had — a pretty good-sized effective population size for such a long time. there have been a LOT of han chinese for — wow — millennia.

so, it looks like this (in this order of inbrededness — i think):

– bedouins: highest mean, and very wide spread, of long roh; high total numbers, and widest spread, of long roh.
– pashtun: low mean, but widest spread, of long roh; low total number, but very wide spread, of long roh.
– han chinese: very low mean, and very narrow spread, of long roh; highest total numbers, and wide spread, of long roh.
– italians: low mean, and rather wide spread, of long roh; very low total number, and very small spread, of long roh.

other interesting points are that:

– the tuscans/tsi (toscani) appear to have lower short, medium and long mean roh than the generic “italian” category. however, the tuscans have lower total numbers of long roh than the “italians” while the toscani (tsi), on the other hand, appear to have a greater total number of long roh than the “italians.” while the tuscan samples and the toscani/tsi samples are from different studies (hgdp vs. hapmap), they are all supposed to be from tuscany, so it’s surprising that they’re so different. perhaps the individuals in the toscani/tsi sample were more closely related somehow?

– the northern han samples have lower short, medium and long mean roh than the generic “han” category. this would fit my general impression that historically inbreeding has been greater in southern china than in the north. however, the total number of long roh are greater in the northern han sample than in the “han” sample. not sure what that means.

don’t forget that there can be all sorts of reasons for differences in roh: inbreeding vs. outbreeding, yes, but also effective population size, population movement (migration in or out), bottlenecks, etc. i just happen to be interested in trying to pick out the effects of inbreeding/outbreeding — if possible.
_____

**update – here are a couple of excerpts from the article (thnx, b.b.!) [pgs. 277, 279-281]:

“Size Classification of ROH

“Separately in each population, we modeled the distribution of ROH lengths as a mixture of three Gaussian distributions that we interpreted as representing three ROH classes: (A) short ROH measuring tens of kb that probably reflect homozygosity for ancient haplotypes that contribute to local LD [linkage disequilibrium] patterns, (B) intermediate ROH measuring hundreds of kb to several Mb that probably result from background relatedness owing to limited population size, and (c) long ROH measuring multiple Mb that probably result from recent parental relatedness….

“In each population, the size distribution of ROH appears to contain multiple components (Figure 2A). Using a three-component Gaussian mixture model, we classified ROH in each population into three size classes (Figure 2B): short (class A), intermediate (class B), and long (class C). Size boundaries between different classes vary across populations (Table S1); however, considering all populations, all A-B boundaries are strictly smaller than all B-C boundaries (Figure 2C). The mean sizes of class A and B ROH are similar among populations from the same geographic region (Figure S3), with the exception that Africa and East Asia have greater variability. The class C mean is generally largest in the Middle East, Central/South Asia, and the Americas and smallest in East Asia (Figure S3), with the exception that the Tujia population has the largest values. In the admixed Mexican population (MXL), mean ROH sizes are similar to those in European populations. In the admixted African American population (ASW), however, mean ROH sizes are among the smallest in our data set, notably smaller than in most Africans and Europeans.

“Geographic Pattern of ROH

Several patterns emerge from a comparison of the per-individual total lengths of ROH across populations (Figure 3). First, the total lengths of class A (Figure 3A) and class B (Figure 3B) ROH generally increase with distance from Africa, rising in a stepwise fashion in successive continental groups. This trend is similar to the observed reduction in haplotype diversity with increasing distance from Africa. Second, total lengths of class C ROH (Figure 3C) do not show the stepwise increase. Instead, they are higher and more variable in most populations from the Middle East, Central/South Asia, Oceania, and the Americas than in most populations from Africa, Europe, and East Asia. This pattern suggests that a larger fraction of individuals from the Middle East, Central/South Asia, Oceanis, and the Americas tend to have higher levels of parental relatedness, in accordance with demographic estimates of high levels of consanguineous marriage particularly in populations from the Middle East and central/South Asia, and it is similar to that observed for inbreeding-coefficient and identity-by-descent estimates. Third, in the admixed ASW and MXL individuals, total lengths of ROH in each size class are similar to those observed in populations from Africa and Europe, respectively (Figure 3).

“The total numbers of ROH per individual (Figure S4) show similar patterns to those observed for total lengths (Figure 3). However, in East Asian populations, total numbers of class B and class C ROH per individual are notably more variable across populations than are ROH total lengths.”

previously: runs of homozygosity and inbreeding (and outbreeding) and ibd and historic mating patterns in europe

(note: comments do not require an email. ribbit!)

ibd and historic mating patterns in europe

**update 08/03: post fixed to remove references to roh which i got wrong (roh≠blocks of ibd!) — see comments below (thanks, citrus!)**

princenuadha points me to this awesome pdf which i guess was a presentation given at a society for molecular biology and evolution (smbe) conference last weekend (thanks, prince!).

here is an interesting graphic from the presentation (pg. 21):

what this map shows are the means of runs of homozygosity (remember those?) blocks of identity by descent (ibd) that are greater than 1cM for each of these european populations. the longer the ibd blocks, the greater the identity by descent, and vice versa. small circles=fewer long blocks of ibd; large circles=more long blocks of ibd.

if a population has lots of short blocks of ibd, then its genetics are all mixed up, possibly due to outbreeding or because of a fairly recent mixing with another population. if a population has lots of long blocks of ibd, then its genetics are not so mixed up and the individuals within it share a lot of identity by descent. this can be an indicator of having been squeezed through a bottleneck or close inbreeding over time.

here are the mean numbers of long blocks of ibd for some of the countries on the map:

as you can see, my “core europeans” (english, french, germans, dutch, prolly some others) all have low means of blocks of ibd. the smallest circles are found right in the center of nw europe: england, france, belgium, germany. also italy (more about that below). in the immediate periphery around core europe, the circles are a bit larger, i.e. there are more long blocks of ibd: scotland, ireland, spain, portugal, switzerland, greece, scandinavians. eastern europeans have even larger circles/even more long blocks of ibd: poles, russians. and populations in the balkans, like the albanians, have enormous circles, i.e. LOTS of long blocks of ibd.

all of that fits the pattern i’ve been talking about here on the ol’ blog (see the mating patterns series below in the left-hand column): that the core europeans have been outbreeding the most and for the longest, with peripheral europeans lagging behind that trend, and eastern europeans really lagging behind the trend. i haven’t actually discussed the balkan populations (yet), but i do know that cousin/endogamous marriage rates are pretty high in the balkans.

i wonder if the numbers for italy may be unrepresentatively low, but it’s difficult to know. the data used are from popres and, like so much genetic data out there, have no provenance info attached to them. so, are the italian data from northern italy (which has a long history of outbreeding) or southern italy (which has a lot of inbreeding) or a combination of both? dunno.

this is a very cool study! i like it a lot. (^_^)

polish gen also has an interesting post about the presentation, btw.

(note: comments do not require an email. ruh roh!)