the hgdp samples again

i’ve written before (here, here and here) about the hgdp samples and the fact that there is very little to no provenance info connected to them. the problem with this, afaics, is that it’s difficult to know whether or not the hgdp samples are truly representative, in all ways, of the populations from which they came.

i was particularly concerned initially about the french (and the japanese) hgdp samples — and then i got over that — but now i’m concerned about them again. here’s why:

the hgdp samples from france are described thusly:

“France – French/various regions (relatives) – This sample from various regions of France is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.”

great!

hang on — which regions?

auvergne? where, in some villages in the eighteenth century, groups of families regularly inbred with one another? lorraine? which, in some areas, had consanguinity rates of up to 50% between 1810 and 1910? burgundy or brittany, both of which had reportedly higher cousin marriage rates in the nineteenth and twentieth centuries than other regions of france? or were the hgdp samples collected in places like central france which, historically, had much lower rates (in the range of 1-3.5%) of close marriages?

the thing is: we don’t know.

what we do know is that the hgdp sampling seems kinda biased towards unique little groups like basques and orcadians, sardinians and the adygei. which is understandable ’cause these are all interesting, unusual groups and there’s legitimate concern that their unique genomes might sorta disappear in our modern, outbreeding world, and it would be a shame to miss out on the chance to at least keep a record of all that human biodiversity.

but then i have to wonder how representative of the majority of french people are the french hgdp samples? do they truly represent “the french,” or did the samples come from some of those crazy little villages way up in the mountains? i dunno. and neither does anybody else (afaik).

and the reason i wonder is: if teh scientists are gonna do really awesome genetic studies to check for the relatedness between the members of different human populations — like runs of homozygosity (roh) studies or identity by descent (ibd) studies — i think they need to know if the samples they’re looking at are representative or not. do the results for “the french” in studies like this or this or this truly represent the average french, or do they represent some special sub-groups of mountain dwelling french?

in the most recent roh study i posted about, the “french” don’t appear to be much more in- or out-bred than orcadians or the basques, something which strikes me as odd. perhaps — perhaps — that’s because the french hgdp samples are not truly representative of the broader french population. perhaps. i don’t know. nor do the researchers.

rinse and repeat above discussion for the other samples, too.

previously: hgdp samples and relatedness and more on the hgdp samples and why i care about the hgdp samples and meanwhile, in france… and runs of homozygosity and inbreeding (and outbreeding) and ibd and historic mating patterns in europe and runs of homozygosity again

(note: comments do not require an email. not out on a limb, am i?)

Advertisements

4 Comments

  1. Hate to be pessimistic myself, but I find these samples dodgy too. There’s no indication at all given as to where they’re from, and the sample size is tiny. Just to reinforce your point about the importance of distinguishing different regions, I want to quote this from that 1948 paper in French, p. 611 (my translation):

    We note that everywhere [in France] the frequency of cousin marriage falls under the influence of civilizational development. … For example, in Corsica, it’s fallen from 8.2% (1926-1930) to 5.5% (1941-1945). In the Haute-Loire, in the same time frame, it’s fallen from 5.7% to 2.75%; in Savoy from 11% to 4.4%; (Saint-Jean-de-Maurienne), from 5.4% to 3.5% (Albertville), from 2.8% to 1.52% (Chambéry); in Côtes-du-Nord [Brittany] from 4.95% to 2.63%; in Basses-Alpes from 3.25% to 1.27%; in Ariège from 3.35% to 1%, etc.

    These are some big variations between regions, and with time depth, who knows how high they once were. I just don’t see how we can have faith in the HGDP samples not knowing if they’re Alsatians or Corsicans or Savoyards, etc, plus how related they are (or aren’t).

    Reply

  2. @m.g. – “These are some big variations between regions, and with time depth, who knows how high they once were. I just don’t see how we can have faith in the HGDP samples not knowing if they’re Alsatians or Corsicans or Savoyards, etc….”

    well, that’s exactly it. i mean, i think genetics and genetic studies are AWESOME — and i love them and what they can tell us! (what a fantastic time in which to be living! (^_^) ) but i don’t think the data sets are quite “there” yet. they’re going to be — very shortly! but in these recent studies looking at relatedness (roh and ibd) — well, i think the researchers have just jumped the gun a bit and are working with data sets that aren’t informative enough.

    i don’t really blame them. i want to know NOW, too! (~_^) but, really — everyone needs to think this all through a bit more, i think.

    @m.g. – “…plus how related they are (or aren’t).”

    yes. that is also a problem/difficulty. one researcher actually sat down and went through the hgdp samples to see if he could spot likely relatives in the samples (going by how similar the genomes are). he found one likely father-son pair in the french samples [pg. 8 – pdf], so there you go.

    frankly, i find the need for such a study to be really silly. i don’t mean to criticize teh scientists too much (i love scientists!), but this sort of double- and triple-work is just a waste of everyone’s time. i understand that there are ethical issues and privacy issues and all that with genetic data, but i do think they went overboard in the wrong direction in not collecting hardly any provenance info.

    well, times are changing anyway, and as people are getting used to genetic data being available, they’re getting more comfortable with sharing it (see dienekes’ wonderful dodecad study, for instance). so i’m pretty certain the whole situation is going to change — and really soon. (^_^)

    (the other thing that i look forward to is lots and lots of genetic data becoming available from archaeological sites! we’ll be able to reconstruct anglo-saxon kinship groups — and how related they all were to one another — from skeletal remains. (^_^) not to mention inferring past relatedness from living peoples’ genomes. exciting! (^_^) )

    Reply

  3. @m.g. – “I want to quote this from that 1948 paper in French, p. 611 (my translation)”

    thanks for that translation! very interesting. i need to read through that whole paper one of these days. so far i’ve only looked at the pictures. (~_^) (i might hit you up for help on translating some bits if i need it….)

    Reply

  4. but i do think they went overboard in the wrong direction in not collecting hardly any provenance info.

    Exactly. Erring a bit too much on the side of caution.

    (i might hit you up for help on translating some bits if i need it….)

    It would be my pleasure, sincerely. N’hésitez pas!

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s