different mutation rates in different human populations

well this seems important! via race/history/evolution notes, an abstract from the society for molecular biology and evolution 2014 conference (in puerto rico! – teh scientists are always good to themselves whenever they can be (~_^) ):

Evidence for different mutation rates across human populations
Ron Do, David Reich
Department of Genetics, Harvard Medical School, Boston, USA

Although mutation rates (per base pair) have clearly changed across primate evolution, many analyses continue to assume that all present-day human populations have the same mutation rates. Recently, William Amos analyzed 1000 Genomes Project and Complete Genomics sequences and found evidence of significantly higher divergence rates on African than on non-African lineages since separation (W. Amos, PLoS One 4, e63048). The detected pattern was strongest in genomic regions of high polymorphism rate, a pattern that the author hypothesized was due to ‘heterozygote instability’, whereby gene conversion events surrounding heterozygous sites increase the mutation rate. To further test this observation, we measured the relative accumulation of mutations in lineages drawn from two different populations, using 25 deep genome sequences generated according to the same experimental protocol using the Illumina technology. We carried out pairwise comparisons of five sub-Saharan African (Dinka, Mandenka, Mbuti, San, Yoruba) and eight Non-African populations (Australian, Dai, French, Han, Karitiana, Mixe, Papuan, Sardinian) on all divergent sites. We observed statistically significant differences in the relative accumulation of mutations for many pairs of African and Non-African populations. Among the strongest differences is significantly more lineage-specific mutations in Mbuti than in Han Chinese (R=1.044, standard error (SE) =0.0015). On average, we observed about 1% more mutations on African lineages compared to Non-African lineages. We also observed some significant differences across non-African populations, with the Han Chinese who have experienced extreme expansions in population size associated with agriculture having more mutations than the Karitiana, a hunter-gatherer population from Amazonia who did not experience such expansions (R=1.015, SE=0.0014). The results are consistent across both European and African segments of the human reference sequence, so are not an artifact of reference sequence bias. Taken together, these results support the view that per-base pair mutation rates may be dynamically and substantially changing across humans.


wrt to greater number of mutations in african lineages: polygamy (and, therefore, older fathers)? life in the tropics?

(note: comments do not require an email. old san juan. (^_^) )

runs of homozygosity again

**update below**

here’s an exciting new paper!: Genomic Patterns of Homozygosity in Worldwide Human Populations. i don’t have access to the paper itself, but there are lots o’ neat figures and tables in the supplemental data [opens pdf] that relate to runs of homozygosity (roh). roh are identical stretches of dna within an individual’s genome (i.e. identical on each of the dna strands, paternally and maternally inherited). (roh shouldn’t be confused with blocks of identity by descent [ibd], which i did once! ibd blocks are identical stretches of dna as compared between different individuals, iiuc.)

recall that possessing lots of long roh indicates that one’s parents are/were quite similiar genetically speaking. that can be as a result of a couple of different genetic scenarios like (as greying wanderer has brought up a lot recently) simply being from a small sized population (i.e. having a small effective population size) and/or from regular inbreeding (consanguineous/endogamous mating). so, a population having a lot of long roh is either small and/or inbreeds a lot. populations having LOTS of short roh have probably been through some sort of bottleneck (see previous post).

in the paper i looked at in that previous post, the researchers had looked at the different roh lengths for large, regional populations like “europeans” or “east asians.” amongst other things, they had found that some of my regular inbreeders — the fbd marriage folks — had some of the highest numbers of medium and long roh, a state of genetic affairs which likely reflects their long-term close mating patterns. interestingly, the researchers had found that east asians had roh lengths similar to those of europeans across the board, something which surprised me since, at least according to what i’ve been reading, east asians (i.e. the chinese) have been inbreeding for a much longer time than europeans. one drawback of that previous study, though, was that, apart from the french, most of the european populations they looked at were peripheral groups who have had a tendency to inbreed more than my “core” europeans (see mating patterns in europe series below ↓ in left-hand column).

the new paper suffers from some of the same problems since the data come from the same sources (hgdp-ceph and hapmap phase 3 populations), so northern europeans — apart from the french — aren’t included in this paper either. (what can you do? it’s early days yet. i look forward to when there’s lots more genetic data available out there for teh scientists to work with! (^_^) )

what the researchers in this paper have done, though, is to look at both the different mean lengths of roh in each of the different populations sampled AND they looked at total numbers of roh within individuals for each population. this has, i think, drawn out some interesting differences between the populations.

first, here are two graphics from the supplmental data (linked to above). click on each for LARGER views (they should open in new tabs/windows — you might have to click on them again there to super-size them).

i’ve highlighted a handful of populations i want to focus on ’cause i know a little something about their historic mating patterns: the bedouin (as a proxy for the arabs — note that the bedouin have probably inbred more than more settled arabs); italians (not sure if they’re northern or southern italians or a mix of both — however, there are tuscans in the samples with which these “italians” can be compared); pathan or pastuns (more fbd marriage folks, like the bedouins/arabs); and han chinese (there are some northern han chinese with whom this groups can be compared). ok. here are the charts:

as you can see, the researchers have split up the roh into three classes (note that the short and medium classes here are a lot shorter than those in the paper looked at previously):

– A: 0.25-0.40 Mb (short)
– B: 0.6-1.2 Mb (medium)
– C: 0-35 Mb (long)

the interesting thing in the first chart above (Fig. S3 – Mean ROH Length for Each of the Three Size Classes in Each Population), is that the han chinese have lower means of roh length in all of the size classes compared to the other populations i’ve highlighted. in the previous study, the researchers found that east asians had similar means to europeans for all roh lengths. i found this surprising since, from what i’ve read, the han chinese have been inbreeding for a longer period of time than europeans. what might be confounding the results though, once again, is the fact that nw europeans (the outbreeders extraordinaire) are not really included in either of these studies apart from a handful of french samples.

in this latest study, both the bedouin and the pashtun, for instance, have higher means — and wider spreads — of long (class C) roh than the italians, which is what i would’ve expected since those two groups (the bedouins and the pashtuns) are, being fbd marriage folks, serious inbreeders. perhaps the reason the han chinese long roh mean is comparatively low is partly due to the fact that they historically practiced mother’s brother’s daughter (mbd) marriage which doesn’t push towards such close inbreeding as fbd marriage. still, i would’ve expected to see greater means of roh for the chinese than the italians — or, at least, around the same. not so much lower. (unless the italians practiced fbd marriage, too — or fzd marriage — but i don’t think so.)

if you look at the second chart (Fig. S4 – Total Number of ROH in Individual Genomes), however, you’ll see that, overall, the han chinese have more short, medium and long roh totally in individual genomes than any of the other three populations i’ve highlighted. both the bedouins and the pashtuns have greater numbers/wider total spread of long roh than the italians, but the han chinese have a much greater total number of long roh than any of the other three groups — three or four times as many.

but they’re, on average, shorter long roh don’t forget. (confusing, eh?!)

perhaps this is what you get when you have — as the chinese have had — a pretty good-sized effective population size for such a long time. there have been a LOT of han chinese for — wow — millennia.

so, it looks like this (in this order of inbrededness — i think):

– bedouins: highest mean, and very wide spread, of long roh; high total numbers, and widest spread, of long roh.
– pashtun: low mean, but widest spread, of long roh; low total number, but very wide spread, of long roh.
– han chinese: very low mean, and very narrow spread, of long roh; highest total numbers, and wide spread, of long roh.
– italians: low mean, and rather wide spread, of long roh; very low total number, and very small spread, of long roh.

other interesting points are that:

– the tuscans/tsi (toscani) appear to have lower short, medium and long mean roh than the generic “italian” category. however, the tuscans have lower total numbers of long roh than the “italians” while the toscani (tsi), on the other hand, appear to have a greater total number of long roh than the “italians.” while the tuscan samples and the toscani/tsi samples are from different studies (hgdp vs. hapmap), they are all supposed to be from tuscany, so it’s surprising that they’re so different. perhaps the individuals in the toscani/tsi sample were more closely related somehow?

– the northern han samples have lower short, medium and long mean roh than the generic “han” category. this would fit my general impression that historically inbreeding has been greater in southern china than in the north. however, the total number of long roh are greater in the northern han sample than in the “han” sample. not sure what that means.

don’t forget that there can be all sorts of reasons for differences in roh: inbreeding vs. outbreeding, yes, but also effective population size, population movement (migration in or out), bottlenecks, etc. i just happen to be interested in trying to pick out the effects of inbreeding/outbreeding — if possible.

**update – here are a couple of excerpts from the article (thnx, b.b.!) [pgs. 277, 279-281]:

“Size Classification of ROH

“Separately in each population, we modeled the distribution of ROH lengths as a mixture of three Gaussian distributions that we interpreted as representing three ROH classes: (A) short ROH measuring tens of kb that probably reflect homozygosity for ancient haplotypes that contribute to local LD [linkage disequilibrium] patterns, (B) intermediate ROH measuring hundreds of kb to several Mb that probably result from background relatedness owing to limited population size, and (c) long ROH measuring multiple Mb that probably result from recent parental relatedness….

“In each population, the size distribution of ROH appears to contain multiple components (Figure 2A). Using a three-component Gaussian mixture model, we classified ROH in each population into three size classes (Figure 2B): short (class A), intermediate (class B), and long (class C). Size boundaries between different classes vary across populations (Table S1); however, considering all populations, all A-B boundaries are strictly smaller than all B-C boundaries (Figure 2C). The mean sizes of class A and B ROH are similar among populations from the same geographic region (Figure S3), with the exception that Africa and East Asia have greater variability. The class C mean is generally largest in the Middle East, Central/South Asia, and the Americas and smallest in East Asia (Figure S3), with the exception that the Tujia population has the largest values. In the admixed Mexican population (MXL), mean ROH sizes are similar to those in European populations. In the admixted African American population (ASW), however, mean ROH sizes are among the smallest in our data set, notably smaller than in most Africans and Europeans.

“Geographic Pattern of ROH

Several patterns emerge from a comparison of the per-individual total lengths of ROH across populations (Figure 3). First, the total lengths of class A (Figure 3A) and class B (Figure 3B) ROH generally increase with distance from Africa, rising in a stepwise fashion in successive continental groups. This trend is similar to the observed reduction in haplotype diversity with increasing distance from Africa. Second, total lengths of class C ROH (Figure 3C) do not show the stepwise increase. Instead, they are higher and more variable in most populations from the Middle East, Central/South Asia, Oceania, and the Americas than in most populations from Africa, Europe, and East Asia. This pattern suggests that a larger fraction of individuals from the Middle East, Central/South Asia, Oceanis, and the Americas tend to have higher levels of parental relatedness, in accordance with demographic estimates of high levels of consanguineous marriage particularly in populations from the Middle East and central/South Asia, and it is similar to that observed for inbreeding-coefficient and identity-by-descent estimates. Third, in the admixed ASW and MXL individuals, total lengths of ROH in each size class are similar to those observed in populations from Africa and Europe, respectively (Figure 3).

“The total numbers of ROH per individual (Figure S4) show similar patterns to those observed for total lengths (Figure 3). However, in East Asian populations, total numbers of class B and class C ROH per individual are notably more variable across populations than are ROH total lengths.”

previously: runs of homozygosity and inbreeding (and outbreeding) and ibd and historic mating patterns in europe

(note: comments do not require an email. ribbit!)

runs of homozygosity and inbreeding (and outbreeding)

here’s a really neat chart! (click on image for LARGER view. should open in new tab/window.):

what does it mean? well…

some very clever researchers/geneticists took a look for “runs of homozygosity” (roh) in the genomes of the individuals in the human genome diversity project (hgdp) — that’s 1043 individuals from 51 different populations. “runs of homozygosity” are stretches in the genome where identical dna was inherited from each parent. if you inbreed, you’re gonna have a greater number of longer runs of homozygosity in your genome than if you don’t.

apart from being just plain fun, sex shuffles up genomes from one generation to the next (presumably for some good reason or another). if you were to clone yourself, your descendants would have (pretty much) the same exact genome as you. if you were to mate with your mother or your sister (i know — ewwww!), your descendants would have different genomes from you, but they’d have lots of roh in their genomes ’cause their dna came from you and someone with whom you share a lot dna in common. the farther out you mate, the less homozygosity there’s likely to be.

you might also have lots of roh in your genome if you come from a population that has little genetic diversity — ’cause maybe your ancestors went through some sort of bottleneck or something.

inbreeding with close relatives — like marrying your first- or second-cousins (consanguineous matings) — leads to long roh since you share so much of your dna with your closest family members. endogamous mating — just mating within your population but not your close cousins — also leads to roh, but not ones as long as mating with your close relatives. you share dna with others in your population (say your clan or your ethnic group), but not so much of exactly the same dna or genes in certain stretches as with your closer relatives. a population will little genetic diversity, but that does not inbreed, will have lots of short roh — they share a lot of stretches of dna in common, but all of the outbreeding shuffles up the genomes within the population.

so that’s:

long roh = inbreeding, probably consanguineous (first-/second-cousin matings)
medium roh = endogmaous mating within a population
short roh = little genetic diversity in the population probably from an event like a population bottleneck

i’m oversimplifying, but that’s the gist of it.

so what did the researchers find when they looked at the 51 populations in the hgdp (see chart)?

– LOTS of short roh (1-2 Mb) within populations from oceania and central/south america, probably because those populations went through bottlenecks. the people from oceania have low amounts of long roh (>16 Mb), which means that they don’t inbreed closely much. however, the people from central/south america have the highest amount of long roh of all the groups, so that’s means they must inbreed closely a LOT.

– central/south asians, west asians, east asians, europeans and africans don’t have huge amounts of short roh — at least not compared to the folks in oceania and the americas. no big bottlenecks there. and africans, in fact, have the fewest short roh.

– central/south asians and west asians have pretty high amounts of roh in all of the middle ranges and the highest long roh after the native american populations. this indicates significant amounts of endogamy and close relative marriages (but we already knew that).

– the groups with the lowest amounts of long roh are the europeans, africans and east asians — in that order. in other words, it appears as though, of these three groups, africans and europeans inbreed more closely (first- or second-cousin marriage, say) than east asians.

if you’ve been following along, you know that’s not what hbd chick expected. i thought that east asians would’ve had more short roh than europeans ’cause they have a fairly recent history of close marriages. hmmmm….

i checked to see which populations of europeans are included in the hdgp (you can find a list in the article’s supplemental material here [opens pdf]) and they are: adygeis, basques, french folks, italians, orcadians, russians, sardinians and tuscans. apart from the french and the tuscans, all of these groups have recent (or current) histories of consanguineous or endogamous mating practices (see Inbreeding in Europe series below in left-hand column for more details), so they are not a fully representative sample of europeans. unfortunately, “core” europe, which contains the most outbred populations in europe, is not included in the hgdp and, therefore, not in this study.


still — this is interesting stuff! genetics. cool! i’m going to post more about this ’cause, for one thing, it should be possible to drill down further into these populations to compare them more specifically (there are some data available in the supplemental materials). so, more anon…!

thanks to prof. harpending for pointing out this article! (^_^)

*update 08/20: see also runs of homozygosity again

(note: comments do not require an email. lesson one.)

genetic research game

via parapundit:

“Forget Farmville, here’s a game that drives genetic research”

“Playing online can mean more than killing time, thanks to a new game developed by a team of bioinformaticians at McGill University. Now, players can contribute in a fun way to genetic research. ‘There are some calculations that the human brain does more efficiently than any computer can, such as recognizing a face,’ explained lead researcher Dr. Jérôme Waldispuhl of the School of Computer Science. ‘Recognizing and sorting the patterns in the human genetic code falls in that category. Our new online game enables players to have fun while contributing to genetic research – players can even choose which genetic disease they want to help decode.’ The game is called Phylo….”


play it HERE.

(note: comments do not require an email.)