if the hgdp samples are going to be used to look at the degrees of kinship within populations — which would be awesome! and which prof. harpending started to do recently — then care has to be taken to identify which sets of samples include lots of relatives.

if you’re gonna analyze a bunch of individuals’ genomes to ascertain the degree of kinship between them in order to determine the degree of kinship within their broader population, then you want to make sure you’ve got a random, representative sample from the population and not a bunch of relatives since, of course, a bunch of relatives will naturally have a high degree of kinship.

and if you found a high degree of kinship in a set of samples that included a bunch of relatives and didn’t know you were looking at a set of relatives, you might conclude that there must be a high degree of kinship across the broader population, too, but this might not be the case at all.

for example, take the hgdp samples from the pashtun and the kalash in pakistan. twenty-five genomes are available from each group, but according to rosenberg (see previous post), none of the individuals in the pashtun group were relatives whereas in the kalash group there likely are: one parent-offspring pair, one half-sibling pair (or an equivalent), and four pairs of cousins.

let’s say, then, that it was found that these two sets of samples — the pashtun and the kalash — had exactly the same degree of internal kinship between their members, genetically speaking. that would mean that the members of the broader pashtun population had the same kinship to one another as the several sets of relatives in the kalash population, since the pashtun genomes were random samples whereas the kalash ones were not. it might look like the two groups had the same, population-wide degree of kinship, but in reality that would not be what we had found.

of the 52 population samples in the hgdp (the south african bantus are counted as one group … even though they come from different ethnic groups … hmmmm …), rosenberg found that exactly half (26) include relatives.

the other problem i have with the hgdp samples involves the ones that were collected from immigrant groups here in the u.s.:

- han chinese: “This is a sample of Han Chinese living in the San Francisco, California.”
- japanese: “Collected by L.L. Cavalli-Sforza from Japanese-born individuals living in the San Francisco Bay area, and by K.K. Kidd and J. R. Kidd from Japanese-born individuals living in Connecticut.”
- cambodians: “Collected by K. Dumars from individuals born in Cambodia who are now living in Santa Ana, California.”

are these immigrants really representative of their native populations? are they first- or second- or fourth-generation americans? some immigrant groups start to outbreed in a new land, but others do just the opposite. what’s the case with these groups? how old were the individuals sampled (since in many populations inbreeding rates have gone down in the last 50 years or so)? do they all come from the same region in their native country (like guangdong province), or from all over? to give you an idea of some of the possible problems involved with these sets of samples, have a look at what i said about the japanese samples in the previous post.

and i still have a bug about which regions of france the french samples came from (see previous post). (~_^) it probably doesn’t matter that much, but it would’ve been nice to know how widespread the sampling was, i.e. how random and representative of the entire population of france are these samples? historically, different regions of france have had different inbreeding rates as can be seen in this map of the inbreeding coefficients for france, 1926-1945 [pg. 620] (my guess is this pattern goes back a long way, too — probably to the early medieval period and the introduction of manorialism in continental europe):

so, which regions the samples were drawn from in france might actually make a difference — especially depending upon how deeply one drills down into the question of relatedness. i wonder the same thing about many of the other samples, too, for instance the ones from russia, but we know that all those samples came from the vologda administrative region so we are at least aware that they may not be representive of all ethnic russians.

despite all these potential difficulties, i look forward to more genetic research into kinship and relatedness within populations — from prof. harpending or whomever! very cool stuff! (^_^)

previously: hgdp samples and relatedness and more on the hgdp samples

(note: comments do not require an email. just skip the email!)

