i’ve written before (here, here and here) about the hgdp samples and the fact that there is very little to no provenance info connected to them. the problem with this, afaics, is that it’s difficult to know whether or not the hgdp samples are truly representative, in all ways, of the populations from which they came.

i was particularly concerned initially about the french (and the japanese) hgdp samples — and then i got over that — but now i’m concerned about them again. here’s why:

the hgdp samples from france are described thusly:

“France – French/various regions (relatives) – This sample from various regions of France is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.”


hang on — which regions?

auvergne? where, in some villages in the eighteenth century, groups of families regularly inbred with one another? lorraine? which, in some areas, had consanguinity rates of up to 50% between 1810 and 1910? burgundy or brittany, both of which had reportedly higher cousin marriage rates in the nineteenth and twentieth centuries than other regions of france? or were the hgdp samples collected in places like central france which, historically, had much lower rates (in the range of 1-3.5%) of close marriages?

the thing is: we don’t know.

what we do know is that the hgdp sampling seems kinda biased towards unique little groups like basques and orcadians, sardinians and the adygei. which is understandable ’cause these are all interesting, unusual groups and there’s legitimate concern that their unique genomes might sorta disappear in our modern, outbreeding world, and it would be a shame to miss out on the chance to at least keep a record of all that human biodiversity.

but then i have to wonder how representative of the majority of french people are the french hgdp samples? do they truly represent “the french,” or did the samples come from some of those crazy little villages way up in the mountains? i dunno. and neither does anybody else (afaik).

and the reason i wonder is: if teh scientists are gonna do really awesome genetic studies to check for the relatedness between the members of different human populations — like runs of homozygosity (roh) studies or identity by descent (ibd) studies — i think they need to know if the samples they’re looking at are representative or not. do the results for “the french” in studies like this or this or this truly represent the average french, or do they represent some special sub-groups of mountain dwelling french?

in the most recent roh study i posted about, the “french” don’t appear to be much more in- or out-bred than orcadians or the basques, something which strikes me as odd. perhaps — perhaps — that’s because the french hgdp samples are not truly representative of the broader french population. perhaps. i don’t know. nor do the researchers.

rinse and repeat above discussion for the other samples, too.

