the hgdp samples again

i’ve written before (here, here and here) about the hgdp samples and the fact that there is very little to no provenance info connected to them. the problem with this, afaics, is that it’s difficult to know whether or not the hgdp samples are truly representative, in all ways, of the populations from which they came.

i was particularly concerned initially about the french (and the japanese) hgdp samples — and then i got over that — but now i’m concerned about them again. here’s why:

the hgdp samples from france are described thusly:

“France – French/various regions (relatives) – This sample from various regions of France is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.”

great!

hang on — which regions?

auvergne? where, in some villages in the eighteenth century, groups of families regularly inbred with one another? lorraine? which, in some areas, had consanguinity rates of up to 50% between 1810 and 1910? burgundy or brittany, both of which had reportedly higher cousin marriage rates in the nineteenth and twentieth centuries than other regions of france? or were the hgdp samples collected in places like central france which, historically, had much lower rates (in the range of 1-3.5%) of close marriages?

the thing is: we don’t know.

what we do know is that the hgdp sampling seems kinda biased towards unique little groups like basques and orcadians, sardinians and the adygei. which is understandable ’cause these are all interesting, unusual groups and there’s legitimate concern that their unique genomes might sorta disappear in our modern, outbreeding world, and it would be a shame to miss out on the chance to at least keep a record of all that human biodiversity.

but then i have to wonder how representative of the majority of french people are the french hgdp samples? do they truly represent “the french,” or did the samples come from some of those crazy little villages way up in the mountains? i dunno. and neither does anybody else (afaik).

and the reason i wonder is: if teh scientists are gonna do really awesome genetic studies to check for the relatedness between the members of different human populations — like runs of homozygosity (roh) studies or identity by descent (ibd) studies — i think they need to know if the samples they’re looking at are representative or not. do the results for “the french” in studies like this or this or this truly represent the average french, or do they represent some special sub-groups of mountain dwelling french?

in the most recent roh study i posted about, the “french” don’t appear to be much more in- or out-bred than orcadians or the basques, something which strikes me as odd. perhaps — perhaps — that’s because the french hgdp samples are not truly representative of the broader french population. perhaps. i don’t know. nor do the researchers.

rinse and repeat above discussion for the other samples, too.

previously: hgdp samples and relatedness and more on the hgdp samples and why i care about the hgdp samples and meanwhile, in france… and runs of homozygosity and inbreeding (and outbreeding) and ibd and historic mating patterns in europe and runs of homozygosity again

(note: comments do not require an email. not out on a limb, am i?)

Advertisements

why i care about the hgdp samples

if the hgdp samples are going to be used to look at the degrees of kinship within populations — which would be awesome! and which prof. harpending started to do recently — then care has to be taken to identify which sets of samples include lots of relatives.

if you’re gonna analyze a bunch of individuals’ genomes to ascertain the degree of kinship between them in order to determine the degree of kinship within their broader population, then you want to make sure you’ve got a random, representative sample from the population and not a bunch of relatives since, of course, a bunch of relatives will naturally have a high degree of kinship.

and if you found a high degree of kinship in a set of samples that included a bunch of relatives and didn’t know you were looking at a set of relatives, you might conclude that there must be a high degree of kinship across the broader population, too, but this might not be the case at all.

for example, take the hgdp samples from the pashtun and the kalash in pakistan. twenty-five genomes are available from each group, but according to rosenberg (see previous post), none of the individuals in the pashtun group were relatives whereas in the kalash group there likely are: one parent-offspring pair, one half-sibling pair (or an equivalent), and four pairs of cousins.

let’s say, then, that it was found that these two sets of samples — the pashtun and the kalash — had exactly the same degree of internal kinship between their members, genetically speaking. that would mean that the members of the broader pashtun population had the same kinship to one another as the several sets of relatives in the kalash population, since the pashtun genomes were random samples whereas the kalash ones were not. it might look like the two groups had the same, population-wide degree of kinship, but in reality that would not be what we had found.

of the 52 population samples in the hgdp (the south african bantus are counted as one group … even though they come from different ethnic groups … hmmmm …), rosenberg found that exactly half (26) include relatives.

the other problem i have with the hgdp samples involves the ones that were collected from immigrant groups here in the u.s.:

– han chinese: “This is a sample of Han Chinese living in the San Francisco, California.”
– japanese: “Collected by L.L. Cavalli-Sforza from Japanese-born individuals living in the San Francisco Bay area, and by K.K. Kidd and J. R. Kidd from Japanese-born individuals living in Connecticut.”
– cambodians: “Collected by K. Dumars from individuals born in Cambodia who are now living in Santa Ana, California.”

are these immigrants really representative of their native populations? are they first- or second- or fourth-generation americans? some immigrant groups start to outbreed in a new land, but others do just the opposite. what’s the case with these groups? how old were the individuals sampled (since in many populations inbreeding rates have gone down in the last 50 years or so)? do they all come from the same region in their native country (like guangdong province), or from all over? to give you an idea of some of the possible problems involved with these sets of samples, have a look at what i said about the japanese samples in the previous post.

and i still have a bug about which regions of france the french samples came from (see previous post). (~_^) it probably doesn’t matter that much, but it would’ve been nice to know how widespread the sampling was, i.e. how random and representative of the entire population of france are these samples? historically, different regions of france have had different inbreeding rates as can be seen in this map of the inbreeding coefficients for france, 1926-1945 [pg. 620] (my guess is this pattern goes back a long way, too — probably to the early medieval period and the introduction of manorialism in continental europe):

so, which regions the samples were drawn from in france might actually make a difference — especially depending upon how deeply one drills down into the question of relatedness. i wonder the same thing about many of the other samples, too, for instance the ones from russia, but we know that all those samples came from the vologda administrative region so we are at least aware that they may not be representive of all ethnic russians.

despite all these potential difficulties, i look forward to more genetic research into kinship and relatedness within populations — from prof. harpending or whomever! very cool stuff! (^_^)

previously: hgdp samples and relatedness and more on the hgdp samples

(note: comments do not require an email. just skip the email!)

more on the hgdp samples

first, see my previous post on this if you want to follow along.

in that post, i expressed some concerns over the french human genome diversity project (hgdp) samples since the ceph folks describe them as: French (various regions) relatives. i wondered both of the following: 1) how many and which “various regions,” since different regions of france have historically had different rates of inbreeding — haven’t managed to find out which “various regions” — and 2) how many and what sorts of relatives? i did find out that.

via some genetic wizardry, a noah rosenberg tried to work out if any of the individuals in any the hgdp samples were, in fact, relatives [see here]. to cut a long story short, rosenberg found it likely that two individuals in the french sample were siblings [see pg. 7 here – opens pdf], thus the “relatives” indicator on the ceph website. so, the entire french sample is NOT full of family members like i wondered in my last post — only two of the individuals sampled are likely to have been relatives.

i still think it would be useful to know from which regions the samples were drawn, but i guess i just have to live with not knowing for the meantime. (~_^) but now i feel more secure about professor harpending’s conclusion — that regarding the french: “from the viewpoint of kinship, one person is not very different from another person.”

however, now i feel unsure about the japanese samples! the hgdp samples for the japanese are described on ALFRED as:

“Collected by L.L. Cavalli-Sforza from Japanese-born individuals living in the San Francisco Bay area, and by K.K. Kidd and J. R. Kidd from Japanese-born individuals living in Connecticut.”

ack! well, how representative of japanese people in japan are these people? where did they come from? urban areas? rural areas? different areas? mostly the same areas? how old were they?

i ask all these questions because, historically, urban japanese have had lower inbreeding rates than rural japanese … and the inbreeding rates overall for japan dropped pretty sharply after wwii [see pgs. 4-5 here – opens pdf]. so if the samples include mostly young, urban japanese who recently moved to the u.s., well i wouldn’t be surprised if they look quite outbred. but if the samples include mostly older, rural japanese, i would be surprised if they looked outbred.

now i don’t have any confidence in the japanese hgdp samples — not for looking at kinship within the japanese population anyway. btw, rosenberg didn’t find any likely relatives in the japanese samples.
_____

i went through the ceph table of the hgdp samples and ALFRED and compiled a list of all the hgdp samples and if they 1) likely include any family members (“relatives” – based on rosenberg), and 2) where the samples were collected and from whom, if known. many of the samples don’t have any useful information on their provenance. for example, many of the ALFRED entries say that the samples were drawn from unrelated individuals, but rosenberg found that they, in fact, likely included relatives.

why do i care about any of this? i’ll explain that in another post. right now … coffee! (^_^)

**update: see why i care about the hgdp samples**
_____

the list:

– Central African Republic – Biaka Pygmy (relatives)
This sample is comprised of Biaka, living in the village of Bagandu, in the southwest corner of the Central African Republic (3.42N; 18E altitude approximately 500m). This group is probably an admixture of 3/4 “non-pygmy” African ancestry and 1/4 Mbuti ancestry. The transformed cell lines were established by Judith R. Kidd. The sources of this sample are L. Cavalli-Sforza (Stanford University) and K.K. Kidd, J.R. Kidd (Yale University).

– Democratic Rep of Congo – Mbuti Pygmy (relatives)
The sample is composed of Nilosaharan and Niger Kordofanian speaking Mbuti pygmies from the northeastern part of the Ituri Forest (northeastern Democratic Republic of the Congo). It was collected by L.L. Cavalli-Sforza in 1986.

– Senegal – Mandenka (relatives)
This sample from the Central African Republic is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– Nigeria – Yoruba (relatives)
Most of the Yoruba individuals in this sample are urban health care workers from Benin City, Nigeria, collected by Prof. Friday E. Okonofua and collaborators; cell lines established by Dr. J.R. Kidd.

– Namibia – San (relatives)
This sample from Namibia is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– Kenya – Bantu NE (relatives)
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– S. Africa – Bantu SE Pedi
– S. Africa – Bantu SE Sotho
– S. Africa – Bantu SE Tswana
– S. Africa – Bantu SE Zulu
– S. Africa – Bantu SW Herero
– S. Africa – Bantu SW Ovambo

These samples are part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). They include the following individuals: #993, 994, 1028, 1030, 1031, 1033, 1034, and 1035. These samples consist of unrelated Bantu speakers from southern Africa and were collected with proper informed consent.

– Algeria – Mozabite (relatives)
This sample from Algeria is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– Israel (Negev) – Bedouin (relatives)
This sample from the Negev region of Israel is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– Israel (Carmel) – Druze (relatives)
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). The Druze, a Moslem community from Northern Israel. Collected by B. Bonne-Tamir (Tel Aviv University) as part of the repository of samples of Israeli populations. This sample contains both related and unrelated individuals.

– Israel (Central) – Palestinian (relatives)
This sample from the central region of Israel is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– Pakistan – Brahui
– Pakistan – Balochi (relatives)
– Pakistan – Hazara (relatives)
– Pakistan – Sindhi (relatives)
– Pakistan – Pathan
– Pakistan – Kalash (relatives)
– Pakistan – Burusho

These samples from Pakistan are part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). These samples consist of unrelated individuals and were collected with proper informed consent.

– Pakistan – Makrani
*no info found.*

– China – Han
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This is a sample of Han Chinese living in the San Francisco, California. Collected by L. Cavalli-Sforza (Stanford University), K.K. Kidd, and J.R. Kidd.

– China – Tujia
– China – Yizu/Yi
– China – Miaozu/Miao
– China – Oroqen (relatives)
– China – Daur
– China – Mongola
– China – Hezhen
– China – Xibo
– China – Uygur
– China – Dai
– China – She
– China – Lahu (relatives)
– China – Naxi (relatives)
– China – Tu

These samples from China are part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). These samples consist of unrelated individuals and were collected with proper informed consent.

– Siberia – Yakut
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). Yakut-speaking individuals in the Yakut Autonomous Republic. Individuals sampled were living or were born along the river Lena in the area of Yakutsk and northward, roughly 129-130E, 62-64N. This sample was collected by E.L. Grigorenko.

– Japan – Japanese
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). Collected by L.L. Cavalli-Sforza from Japanese-born individuals living in the San Francisco Bay area, and by K.K. Kidd and J. R. Kidd from Japanese-born individuals living in Connecticut.

– Cambodia – Cambodian (relatives)
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). Collected by K. Dumars from individuals born in Cambodia who are now living in Santa Ana, California.

– France – French/various regions (relatives)
This sample form various regions of France is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– France – Basque
This sample from France is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– Italy – Sardinian
This sample from Italy is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– Italy – from Bergamo
– Italy – Tuscany

*no info found.*

– Orkney Islands – Orcadian (relatives)
This sample from the Orkney Islands is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– Russia Caucasus – Adygei
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). Adygei-speaking people near Krasnodar in the Russian republic of Adygei, which is in the southeastern section of the country (north of the Caucuses mountains). They are culturally and linguistically distinct from neighboring Russians. This sample was collected by E. Grigorenko (Yale University) V. Galkina, and M. Kadoshnikova (Bristol company, Russia).

– Russia – Russian
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). Sample collected by E. Grigorenko from rural communities of ethnic Russians living in the Vologda Administrative Region, about 400 km north of Moscow, roughly 59-61N, 39-41E.

– Mexico – Pima (relatives)
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). Collected from Pima living near the eastern border of the state of Sonora, Mexico. Collected by L.O. Shulz.

– Mexico – Maya (relatives)
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of Mayans who are Yucatec speakers from in the Xmaben village located in the Mexican state of Campeche in the central Yucatan peninsula. Blood and serum markers indicate European admixture to be about 10 % (K. Weiss, personal communication). Some evidence suggests that the area from which this sample was drawn served as a refuge for Maya people from across southern Mexico who fled to this more remote region during a series of revolts against the Spanish in the 19th and early 20th centuries. There are 53 transformed cell lines (106 chromosomes) established by Judith R. Kidd. The sources of this sample are K.K. Kidd and J.R. Kidd (Yale University).

– Colombia – Piapoco and Curripaco (relatives)
*no info found.*

– Brazil – Karitiana (relatives)
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). The sample was collected in the Karitiana village (Rondonia Province, Brazil) by F. Black. HLA haplotypes indicate that the Karitiana have no non-Amerindian admixture and are genetically distinct from other sampled populations in relative geographical proximity, such as the Surui.

– Brazil – Burui (relatives)
*no info found.*

previously: hgdp samples and relatedness

(note: comments do not require an email. remember: you’re better off just skipping it!)

hgdp samples and relatedness

**update 03/22: see follow up post — more on the hgdp samples — and just ignore what i said about the french samples below.**

**update 08/28: ignore what i said about ignoring what i said about the french samples. see here.**
_____

i had a post up back in january about some cool research that looked at what runs of homozygosity (roh) in samples from the human genome diversity project (hgdp) can tell us about the inbreeding or outbreeding of different human populations.

but i’ve been bothered by the thought of how the hgdp samples were gathered. as professor harpending said:

“No one knows, by the way, how sampling was carried out for this nor for any of the HGDP populations.”

ugh. the hgdp is really, really cool — but not having info on where the samples came from — like genealogical info — poses a problem if you want to use this data to look at recent inbreeding/outbreeding or, i think, even the sort of thought experiment that prof. harpening conducted a couple of weeks ago, however cool that was, too.

here’s an example of what i mean.

prof. harpending compared the relatedness or kinship of the individuals in a couple of sets of samples from the hgdp: the french, the japanese, and the druze. he found that the kinship of indviduals in both the french and japanese populations to their nearest “relatives” (i presume two individuals who had the most similar genomes?) is very similar. as he said: “from the viewpoint of kinship, one person is not very different from another person.” the druze, otoh, are very dissimilar and the good professor thinks that this is a population in which “opportunities for discord and clannishness are high as individuals able to discriminate kin would ally against the ‘others.'”

i’m not going to argue with that! the druze, like the arabs, regularly practice father’s brother’s daughter (fbd) marriage, the most incestuous form of cousin marriage around, so i’m not surprised that their genomes reflect this fact. (fbd marriage probably originated in the levant, so it could be that the people who are today known as the druze are the product of one of the longest running close-inbreeding projects in humans around.) amongst the druze, each extended-family or clan must’ve become, over time, it’s own little semi-isolated sub-group. like the arabs, i’d expect a lot of clannishness and infighting.

however, wrt to the french and japanese samples: the ceph folks do have some information on the hgdp samples, and one point of difference between the french and japanese samples is that the french samples are described as having been drawn from relatives whereas the japanese samples were not.

there are 29 french samples described as: French (various regions) relatives, and there are 31 japanese samples described as just Japanese, so i assume that means the japanese samples do not include relatives.

so what does French (various regions) relatives mean? i guess that the samples were drawn from different regions of france, but we don’t know which regions or how many. (which is too bad because different regions of france have, historically, had different inbreeding rates.) and how many relatives? who knows? i’m going to presume all 29 are not relatives from one family living scattered across the country, although i suppose that could’ve been the case. what seems more likely to me is that we’re looking at groups of samples from a number of different families, but how many? two, three, four … ten? again, who knows?

what difference would this make? well if the kinship in the french set of samples and the japanese set of samples look to be around the same, i.e. “one person is not very different from another,” BUT the french samples are from relatives and the japanese samples are not, then that would mean that the individuals in the broader french population must be even more like one another than the individuals in the broader japanese population since french family members have the same kinship to one another as japanese strangers do.

to put it more simply, comparing the french and japanese samples is like comparing apples and oranges because, if the ceph information is correct, the french samples include family members whereas the japanese ones do not.

the druze samples, too, are described as coming from relatives — again no info as to how many families/relatives — so the broader druze population should prove to be even more dissimilar to one another than these family members are.

i would love to see lots more studies done on inbreeding/outbreeding (and possible inclusive fitness-related behaviors) in human populations from a genetics p.o.v. — like what prof. harpending did in his recent post. but afaics, using the hgdp data is problematic. i look forward to when there are more whole genome sequences available out there WITH accompanying genealogical/pedigree information.

previously: runs of homozygosity and inbreeding (and outbreeding)

(note: comments do not require an email. in fact, you’re probably better off not using one!)

runs of homozygosity and inbreeding (and outbreeding)

here’s a really neat chart! (click on image for LARGER view. should open in new tab/window.):

what does it mean? well…

some very clever researchers/geneticists took a look for “runs of homozygosity” (roh) in the genomes of the individuals in the human genome diversity project (hgdp) — that’s 1043 individuals from 51 different populations. “runs of homozygosity” are stretches in the genome where identical dna was inherited from each parent. if you inbreed, you’re gonna have a greater number of longer runs of homozygosity in your genome than if you don’t.

apart from being just plain fun, sex shuffles up genomes from one generation to the next (presumably for some good reason or another). if you were to clone yourself, your descendants would have (pretty much) the same exact genome as you. if you were to mate with your mother or your sister (i know — ewwww!), your descendants would have different genomes from you, but they’d have lots of roh in their genomes ’cause their dna came from you and someone with whom you share a lot dna in common. the farther out you mate, the less homozygosity there’s likely to be.

you might also have lots of roh in your genome if you come from a population that has little genetic diversity — ’cause maybe your ancestors went through some sort of bottleneck or something.

inbreeding with close relatives — like marrying your first- or second-cousins (consanguineous matings) — leads to long roh since you share so much of your dna with your closest family members. endogamous mating — just mating within your population but not your close cousins — also leads to roh, but not ones as long as mating with your close relatives. you share dna with others in your population (say your clan or your ethnic group), but not so much of exactly the same dna or genes in certain stretches as with your closer relatives. a population will little genetic diversity, but that does not inbreed, will have lots of short roh — they share a lot of stretches of dna in common, but all of the outbreeding shuffles up the genomes within the population.

so that’s:

long roh = inbreeding, probably consanguineous (first-/second-cousin matings)
medium roh = endogmaous mating within a population
short roh = little genetic diversity in the population probably from an event like a population bottleneck

i’m oversimplifying, but that’s the gist of it.

so what did the researchers find when they looked at the 51 populations in the hgdp (see chart)?

– LOTS of short roh (1-2 Mb) within populations from oceania and central/south america, probably because those populations went through bottlenecks. the people from oceania have low amounts of long roh (>16 Mb), which means that they don’t inbreed closely much. however, the people from central/south america have the highest amount of long roh of all the groups, so that’s means they must inbreed closely a LOT.

– central/south asians, west asians, east asians, europeans and africans don’t have huge amounts of short roh — at least not compared to the folks in oceania and the americas. no big bottlenecks there. and africans, in fact, have the fewest short roh.

– central/south asians and west asians have pretty high amounts of roh in all of the middle ranges and the highest long roh after the native american populations. this indicates significant amounts of endogamy and close relative marriages (but we already knew that).

– the groups with the lowest amounts of long roh are the europeans, africans and east asians — in that order. in other words, it appears as though, of these three groups, africans and europeans inbreed more closely (first- or second-cousin marriage, say) than east asians.

if you’ve been following along, you know that’s not what hbd chick expected. i thought that east asians would’ve had more short roh than europeans ’cause they have a fairly recent history of close marriages. hmmmm….

i checked to see which populations of europeans are included in the hdgp (you can find a list in the article’s supplemental material here [opens pdf]) and they are: adygeis, basques, french folks, italians, orcadians, russians, sardinians and tuscans. apart from the french and the tuscans, all of these groups have recent (or current) histories of consanguineous or endogamous mating practices (see Inbreeding in Europe series below in left-hand column for more details), so they are not a fully representative sample of europeans. unfortunately, “core” europe, which contains the most outbred populations in europe, is not included in the hgdp and, therefore, not in this study.

*sigh*

still — this is interesting stuff! genetics. cool! i’m going to post more about this ’cause, for one thing, it should be possible to drill down further into these populations to compare them more specifically (there are some data available in the supplemental materials). so, more anon…!

thanks to prof. harpending for pointing out this article! (^_^)

*update 08/20: see also runs of homozygosity again

(note: comments do not require an email. lesson one.)