runs of homozygosity again

**update below**

here’s an exciting new paper!: Genomic Patterns of Homozygosity in Worldwide Human Populations. i don’t have access to the paper itself, but there are lots o’ neat figures and tables in the supplemental data [opens pdf] that relate to runs of homozygosity (roh). roh are identical stretches of dna within an individual’s genome (i.e. identical on each of the dna strands, paternally and maternally inherited). (roh shouldn’t be confused with blocks of identity by descent [ibd], which i did once! ibd blocks are identical stretches of dna as compared between different individuals, iiuc.)

recall that possessing lots of long roh indicates that one’s parents are/were quite similiar genetically speaking. that can be as a result of a couple of different genetic scenarios like (as greying wanderer has brought up a lot recently) simply being from a small sized population (i.e. having a small effective population size) and/or from regular inbreeding (consanguineous/endogamous mating). so, a population having a lot of long roh is either small and/or inbreeds a lot. populations having LOTS of short roh have probably been through some sort of bottleneck (see previous post).

in the paper i looked at in that previous post, the researchers had looked at the different roh lengths for large, regional populations like “europeans” or “east asians.” amongst other things, they had found that some of my regular inbreeders — the fbd marriage folks — had some of the highest numbers of medium and long roh, a state of genetic affairs which likely reflects their long-term close mating patterns. interestingly, the researchers had found that east asians had roh lengths similar to those of europeans across the board, something which surprised me since, at least according to what i’ve been reading, east asians (i.e. the chinese) have been inbreeding for a much longer time than europeans. one drawback of that previous study, though, was that, apart from the french, most of the european populations they looked at were peripheral groups who have had a tendency to inbreed more than my “core” europeans (see mating patterns in europe series below ↓ in left-hand column).

the new paper suffers from some of the same problems since the data come from the same sources (hgdp-ceph and hapmap phase 3 populations), so northern europeans — apart from the french — aren’t included in this paper either. (what can you do? it’s early days yet. i look forward to when there’s lots more genetic data available out there for teh scientists to work with! (^_^) )

what the researchers in this paper have done, though, is to look at both the different mean lengths of roh in each of the different populations sampled AND they looked at total numbers of roh within individuals for each population. this has, i think, drawn out some interesting differences between the populations.

first, here are two graphics from the supplmental data (linked to above). click on each for LARGER views (they should open in new tabs/windows — you might have to click on them again there to super-size them).

i’ve highlighted a handful of populations i want to focus on ’cause i know a little something about their historic mating patterns: the bedouin (as a proxy for the arabs — note that the bedouin have probably inbred more than more settled arabs); italians (not sure if they’re northern or southern italians or a mix of both — however, there are tuscans in the samples with which these “italians” can be compared); pathan or pastuns (more fbd marriage folks, like the bedouins/arabs); and han chinese (there are some northern han chinese with whom this groups can be compared). ok. here are the charts:

as you can see, the researchers have split up the roh into three classes (note that the short and medium classes here are a lot shorter than those in the paper looked at previously):

– A: 0.25-0.40 Mb (short)
– B: 0.6-1.2 Mb (medium)
– C: 0-35 Mb (long)

the interesting thing in the first chart above (Fig. S3 – Mean ROH Length for Each of the Three Size Classes in Each Population), is that the han chinese have lower means of roh length in all of the size classes compared to the other populations i’ve highlighted. in the previous study, the researchers found that east asians had similar means to europeans for all roh lengths. i found this surprising since, from what i’ve read, the han chinese have been inbreeding for a longer period of time than europeans. what might be confounding the results though, once again, is the fact that nw europeans (the outbreeders extraordinaire) are not really included in either of these studies apart from a handful of french samples.

in this latest study, both the bedouin and the pashtun, for instance, have higher means — and wider spreads — of long (class C) roh than the italians, which is what i would’ve expected since those two groups (the bedouins and the pashtuns) are, being fbd marriage folks, serious inbreeders. perhaps the reason the han chinese long roh mean is comparatively low is partly due to the fact that they historically practiced mother’s brother’s daughter (mbd) marriage which doesn’t push towards such close inbreeding as fbd marriage. still, i would’ve expected to see greater means of roh for the chinese than the italians — or, at least, around the same. not so much lower. (unless the italians practiced fbd marriage, too — or fzd marriage — but i don’t think so.)

if you look at the second chart (Fig. S4 – Total Number of ROH in Individual Genomes), however, you’ll see that, overall, the han chinese have more short, medium and long roh totally in individual genomes than any of the other three populations i’ve highlighted. both the bedouins and the pashtuns have greater numbers/wider total spread of long roh than the italians, but the han chinese have a much greater total number of long roh than any of the other three groups — three or four times as many.

but they’re, on average, shorter long roh don’t forget. (confusing, eh?!)

perhaps this is what you get when you have — as the chinese have had — a pretty good-sized effective population size for such a long time. there have been a LOT of han chinese for — wow — millennia.

so, it looks like this (in this order of inbrededness — i think):

– bedouins: highest mean, and very wide spread, of long roh; high total numbers, and widest spread, of long roh.
– pashtun: low mean, but widest spread, of long roh; low total number, but very wide spread, of long roh.
– han chinese: very low mean, and very narrow spread, of long roh; highest total numbers, and wide spread, of long roh.
– italians: low mean, and rather wide spread, of long roh; very low total number, and very small spread, of long roh.

other interesting points are that:

– the tuscans/tsi (toscani) appear to have lower short, medium and long mean roh than the generic “italian” category. however, the tuscans have lower total numbers of long roh than the “italians” while the toscani (tsi), on the other hand, appear to have a greater total number of long roh than the “italians.” while the tuscan samples and the toscani/tsi samples are from different studies (hgdp vs. hapmap), they are all supposed to be from tuscany, so it’s surprising that they’re so different. perhaps the individuals in the toscani/tsi sample were more closely related somehow?

– the northern han samples have lower short, medium and long mean roh than the generic “han” category. this would fit my general impression that historically inbreeding has been greater in southern china than in the north. however, the total number of long roh are greater in the northern han sample than in the “han” sample. not sure what that means.

don’t forget that there can be all sorts of reasons for differences in roh: inbreeding vs. outbreeding, yes, but also effective population size, population movement (migration in or out), bottlenecks, etc. i just happen to be interested in trying to pick out the effects of inbreeding/outbreeding — if possible.
_____

**update – here are a couple of excerpts from the article (thnx, b.b.!) [pgs. 277, 279-281]:

“Size Classification of ROH

“Separately in each population, we modeled the distribution of ROH lengths as a mixture of three Gaussian distributions that we interpreted as representing three ROH classes: (A) short ROH measuring tens of kb that probably reflect homozygosity for ancient haplotypes that contribute to local LD [linkage disequilibrium] patterns, (B) intermediate ROH measuring hundreds of kb to several Mb that probably result from background relatedness owing to limited population size, and (c) long ROH measuring multiple Mb that probably result from recent parental relatedness….

“In each population, the size distribution of ROH appears to contain multiple components (Figure 2A). Using a three-component Gaussian mixture model, we classified ROH in each population into three size classes (Figure 2B): short (class A), intermediate (class B), and long (class C). Size boundaries between different classes vary across populations (Table S1); however, considering all populations, all A-B boundaries are strictly smaller than all B-C boundaries (Figure 2C). The mean sizes of class A and B ROH are similar among populations from the same geographic region (Figure S3), with the exception that Africa and East Asia have greater variability. The class C mean is generally largest in the Middle East, Central/South Asia, and the Americas and smallest in East Asia (Figure S3), with the exception that the Tujia population has the largest values. In the admixed Mexican population (MXL), mean ROH sizes are similar to those in European populations. In the admixted African American population (ASW), however, mean ROH sizes are among the smallest in our data set, notably smaller than in most Africans and Europeans.

“Geographic Pattern of ROH

Several patterns emerge from a comparison of the per-individual total lengths of ROH across populations (Figure 3). First, the total lengths of class A (Figure 3A) and class B (Figure 3B) ROH generally increase with distance from Africa, rising in a stepwise fashion in successive continental groups. This trend is similar to the observed reduction in haplotype diversity with increasing distance from Africa. Second, total lengths of class C ROH (Figure 3C) do not show the stepwise increase. Instead, they are higher and more variable in most populations from the Middle East, Central/South Asia, Oceania, and the Americas than in most populations from Africa, Europe, and East Asia. This pattern suggests that a larger fraction of individuals from the Middle East, Central/South Asia, Oceanis, and the Americas tend to have higher levels of parental relatedness, in accordance with demographic estimates of high levels of consanguineous marriage particularly in populations from the Middle East and central/South Asia, and it is similar to that observed for inbreeding-coefficient and identity-by-descent estimates. Third, in the admixed ASW and MXL individuals, total lengths of ROH in each size class are similar to those observed in populations from Africa and Europe, respectively (Figure 3).

“The total numbers of ROH per individual (Figure S4) show similar patterns to those observed for total lengths (Figure 3). However, in East Asian populations, total numbers of class B and class C ROH per individual are notably more variable across populations than are ROH total lengths.”

previously: runs of homozygosity and inbreeding (and outbreeding) and ibd and historic mating patterns in europe

(note: comments do not require an email. ribbit!)

Advertisements

20 Comments

  1. i don’t have access to the paper itself

    Here.

    [edit: just a note of caution for others – i got all sorts of warning messages that the site that b.b. linked to could damage my computer, etc., etc. click on the link at your own peril. seemed to work ok for me, tho. – h. chick (^_^) ]

    Reply

  2. If the Han Chinese recently expanded out from a bottleneck to conquer a large area of China which was previously occupied by “Chinese” (but genetically different) people, would this fit the large number of ROH in general?

    And if many of the Han clans overlap and merge/branch over the course of this history, would this have the effect of breaking expected l-ROH into smaller m-ROH and s-ROH, while still maintaining a large number of each, due to the bottleneck? IE, flexible clans, subject to genetic trading, but still fairly hermetic, most of the time.

    Some studies (http://newscenter.berkeley.edu/2010/07/01/tibetan_genome/) have recently indicated that the common ancestors of Tibetans and Han Chinese split into two populations about 2,750 years ago, with the larger group moving to the Tibetan plateau. The Tibetans eventually shrank, while the low-elevation Han population expanded dramatically, and that’s supposed to be what they did to many other ethnic groups as well (like the many “minorities” still existing in China).

    Other reports also show that the Han recently engaged in dramatic, replacement-type, and fairly insular (towards non-Han) growth (http://www.nature.com/nature/journal/v431/n7006/full/nature02878.html?free=2), of a type that cannot be rivaled by fx. the Italians.

    Mightn’t that work as a model for the Han ROH?

    Reply

  3. @redzengenoist – “If the Han Chinese recently expanded out from a bottleneck to conquer a large area of China which was previously occupied by Chinese’ (but genetically different) people, would this fit the large number of ROH in general?”

    from what i understand (which is not all that much about all this, so — you know — feel free to ask someone who knows what they’re talking about!), lots of short roh in a population is a good indicator of a possible bottleneck. for example, kirin, et al., found this for populations in the americas (see previous post).

    this makes sense if you think about it: population goes through a bottleneck and so population’s genetic variation is small. thus, succeeding generations, even after the genetic decks have been shuffled quite a bit via sexual reproduction, still share lots of little segments of dna in common ’cause they never got any genetic input from elsewhere.

    so, yes — i think your scenario of a recently expanding han chinese population would fit with the high frequencies of very small roh found in the han population (although maybe there’s another possible scenario that, in my ignorance, i’m not aware of!).

    the high frequency of long roh in the han chinese, however, would could indicate more recent inbreeding/close mating. edit: it could, of course, also fit with the bottleneck scenario — native americans (bottleneck populations) also have really high numbers of long roh. problem is, they were also inbreeders, so it can be hard to tease apart what the h*ck is going on!

    thanks for those links! very interesting! (^_^)

    Reply

  4. hbdchick said:
    just a note of caution for others – i got all sorts of warning messages that the site that b.b. linked to could damage my computer, etc., etc.

    MediaFire is a well-known trusted file-hosting website. Presumably it gave that message as someone could upload a malicious file. But MediaFire already automatically scans all uploads for viruses. Though if you are hyper-paranoid, you can always double-check with VirScan.org if a file contains a virus before opening it.

    Reply

  5. @b.b. – “MediaFire is a well-known trusted file-hosting website.”

    cool! i download the file (thanks again!) with no apparent problems — just thought i’d give others a heads-up so they wouldn’t freak out. (~_^)

    Reply

  6. “perhaps this is what you get when you have — as the chinese have had — a pretty good-sized effective population size for such a long time. there have been a LOT of han chinese for — wow — millennia.”

    That’s what i’m wondering.

    If you took population density and divided it into
    – very low (forager)
    – low (pastoralist)
    – medium (average pre-modern farmer)
    – high (high-density pre-modern farmer (Nile? Euphrates? Ganges? Yangtse?)

    and marriage form divided into
    – exogamous (foragers(?) and hajnal line)
    – default endogamous (most of the world)
    – close endogomous (fbd etc)

    and create a 4 x 3 grid of 12 combinations then i think – assuming a clear understanding of how the *proportions* of low, medium and long runs of ROH might be effected by each combination – you could fill each box of the grid with that expected proportion.

    For example the very low / exogamous, box (forager?) might be expected to have the proportions
    – short: 4
    – medium: 2
    – long: 1

    while the high / default endogamous, box (Han?) might be expected to have
    – short: 2
    – medium: 4
    – long: 2

    I’m just guessing at the actual values here but the point is i think there would be signature proportions for each combo.

    On top of that you’d maybe add time to distinguish between two populations where one of them underwent a dramatic transition at some point e.g. two adjacent regions in Asia might be
    – 3000 years in the medium or high / default endogamous, box
    – 2000 years of the same followed by 1000 in the medium or high / close endogamous (fbd conquest), box

    Northern Europe might be
    – 2000 years of low / exogamous
    – 1000 of medium (heavy plow) /exogamous

    Reply

  7. @g.w. – “‘perhaps this is what you get when you have — as the chinese have had — a pretty good-sized effective population size for such a long time. there have been a LOT of han chinese for — wow — millennia.’

    That’s what i’m wondering.”

    i just read the whole of the paper again — really carefully to let it sink in as much as poss (~_^) — and this is some really cool, but really complicated, stuff.

    all sorts of biological events (or whatever you call them) can affect roh: inbreeding/outbreeding, bottlenecks … but also selection.

    distance from africa (as in the out-of-africa theory) seems to be an important factor here. you can see in the second figure the bottom frame (“D”) that the frequencies of roh increase, as the authors say “step-wise,” the further away from africa you get. that’s sorta illustrating a bottleneck after bottleneck effect as modern humans moved from africa all the way to the new world.

    then the authors also discuss how there are roh “hotspots” and “coldspots” in the genome — coldspots (where there’s little recombination) probably arising because there is positive selection on some gene/s in those regions, so they’re not lost in recombination. that can obviously vary in different populations.

    so, it’s gonna be difficult to use the roh to tease out inbreeding/outbreeding patterns. i’m thinking now that it might work ok within regions (like europe or east asia), but it might be kinda hard to compare regions vs. regions. edit: we might be able to pick up a signature/s of inbreeding/outbreeding from the roh frequencies, like you suggest, but we might have to work within broad population regions (indo-european? han chinese?) to get around the bottleneck effect. (am i making any sense? (^_^) )

    Reply

  8. “we might be able to pick up a signature/s of inbreeding/outbreeding from the roh frequencies, like you suggest, but we might have to work within broad population regions (indo-european? han chinese?) to get around the bottleneck effect. (am i making any sense? (^_^) )”

    Yes. I was thinking at a lower scale i.e. an environment (with marriage-culture included as part of the environment) so countries and global regions are secondary and their average metrics would be derived from their proportion of the different environments.

    It’s a variation on the urban / rural argument where you could have two populations with identical urban and rural metrics but very different national averages because their proportions of urban / rural were very different.

    Thinking more at that sort of scale a country (or large region like Europe / East Asia) becomes a collection of sub-districts with a common environment so for example one country or large region might be (or until recently was):
    – 10% very low / exogamous (scattered forager district)
    – 20% low / close endogamous (mountain pastoralist districts)
    – 50% medium / medium endogamous (standard farmers)
    – 20% high / medium endogamous (big riverine valley agrarian district farmers)

    while another is or was
    – 10% low / close endogamous (remote mountain pastoralist districts)
    – 10% low / medium endogamous (less remote mountain pastoralist districts)
    – 70% medium / exogamous (standard farmers with the hajnal marriage culture)
    – 10% high / exogamous (high fertility district farmers with the hajnal marriage culture)

    so the average in itself would mostly hide the sort of variation i’m getting at.

    It might show something – if for example one of the countries / regions being compared had a distinctively high proportion of sub-districts of riverine agrarians or desert pastoralists or exogamous standard farmers – but it would be a function of the *proportions* of those sub-districts.

    A clearer example of what i mean might be if you compared adjacent
    – low population density / fbd district
    – medium density / fbd district
    – high density / fbd district
    would there be a consistent pattern to the proportions of ROH of different lengths?

    As you say the bottlenecks might mean the comparisons need to be across the same latitudes or same distance from Africa or something although within whatever range works i’d have thought there might be a consistent pattern going from
    – very low to low to medium to high population density
    – close endogamous to medium endogamous to exogamous marriage culture
    with a composite consistent pattern in a grid.

    I’m not sure if that makes sense or not?

    Reply

  9. @g.w. – “within whatever range works i’d have thought there might be a consistent pattern going from
    – very low to low to medium to high population density
    – close endogamous to medium endogamous to exogamous marriage culture
    with a composite consistent pattern in a grid.

    I’m not sure if that makes sense or not?”

    absotootly! (^_^) the trick here might be to work out the ranges.

    Reply

  10. “the trick here might be to work out the ranges”

    yes, and easier said than done i’d guess :)

    Reply

  11. i found this surprising since, from what i’ve read, the han chinese have been inbreeding for a longer period of time than europeans. what might be confounding the results though, once again, is the fact that nw europeans (the outbreeders extraordinaire) are not really included in either of these studies apart from a handful of french samples.

    Maybe I’m not interpreting the data correctly, but I don’t see much support in this data for the notion that the French are “outbreeders extraordinaire”. They seem pretty much the same as the other European groups examined.

    Your claim that the Chinese have been inbreeding for a long time is an example of the shortcomings in your definition of “inbreeding”. White Americans and Europeans are a very inbred group, if we utilize your use of the word. After all, their historical rates of intermarriage with non-whites are extremely low. So they are inbred, QED.

    Reply

  12. Off on a tangent again but relevant to other recent posts – from your update

    “The class C mean is generally largest in the Middle East, Central/South Asia, and the Americas and smallest in East Asia (Figure S3), with the exception that the Tujia population has the largest values.”

    http://en.wikipedia.org/wiki/Tujia_people

    “They live in Wuling Mountains…The Tujia tusi chieftains reached the zenith of their power under the Ming Dynasty (1368–1644), when they were accorded comparatively high status by the imperial court. They achieved this through their reputation as providers of fierce, highly-disciplined fighting men, who were employed by the emperor to suppress revolts by other minorities.”

    heh

    Reply

  13. @Frank

    “Your claim that the Chinese have been inbreeding for a long time is an example of the shortcomings in your definition of “inbreeding”.”

    Or it’s maybe an example where the effects of relatively high population density over a long period may have over-ridden the effects of a clan-based marriage pattern.

    .
    “White Americans and Europeans are a very inbred group, if we utilize your use of the word. After all, their historical rates of intermarriage with non-whites are extremely low. So they are inbred, QED.”

    That makes no sense at all unless you’ve misunderstood the comment about the Chinese inbreeding as a function of marrying other Chinese instead of as a function of their traditional clan marriage systems?

    Reply

  14. @g.w. – “‘They achieved this through their reputation as providers of fierce, highly-disciplined fighting men, who were employed by the emperor to suppress revolts by other minorities.’

    “heh”

    ha! of course they did! (^_^)

    Reply

  15. @frank – “Maybe I’m not interpreting the data correctly, but I don’t see much support in this data for the notion that the French are ‘outbreeders extraordinaire’. They seem pretty much the same as the other European groups examined.”

    nw europeans are outbreeders extraordinaire compared to most other populations on the planet (poss./likely exceptions including bushmen and eskimos — prolly some others). again, see the references i’ve given you for more on this.

    there are problems with the hgdp data used in the study above — most crucially is that we don’t have any provenance info for any of the samples, so for all anyone knows, all of the french samples might be from one village (prolly not, but we can’t be sure). and which region/s of france were the samples collected from? we know that different regions have had different historic cousin marriage rates. and there are relatives included in the french samples, so that confuses the matter even more. and there are only 29 of them. so that’s ALL the samples we’ve got for nw europe (unless we include tuscany).

    i look forward to when there are more data for more of nw europe (and the rest of the world!).

    @frank – “Your claim that the Chinese have been inbreeding for a long time is an example of the shortcomings in your definition of ‘inbreeding’. White Americans and Europeans are a very inbred group, if we utilize your use of the word. After all, their historical rates of intermarriage with non-whites are extremely low. So they are inbred, QED.”

    huh? that doesn’t make any sense. my reference to chinese inbreeding relates to the well known fact that for a couple millennia at least the chinese regularly practiced mother’s brother’s daughter marriage. doesn’t have anything to do with marrying people of other races. -?-

    Reply

  16. nw europeans are outbreeders extraordinaire compared to most other populations on the planet

    Unless you disbelieve the data you yourself provided in this very post, I’m perplexed as to how you can say that with a straight face.

    Reply

  17. @frank – “Unless you disbelieve the data you yourself provided in this very post, I’m perplexed as to how you can say that with a straight face.”

    *sigh* this really is getting tiresome, frank. WHY won’t you look at the references i’ve given you several times now already?: goody, mitterauer, macfarlane, ausenda, greif, etc. THOSE sources are where i’ve gotten the outbreeding info from.

    Reply

  18. * this really is getting tiresome, frank. WHY won’t you look at the references i’ve given you several times now already?

    Why don’t you look at the genetic data you have provided in this very post? That data contradicts your preconceptions. So it seems that you look for reasons to dismiss it. “If only we had more data on Europeans it would prove my theory that nw europeans are outbreeders extraordinaire”.

    [edit: please, see latest post on this issue – the hgdp samples again. – hbd chick.]

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s