the hgdp samples again

i’ve written before (here, here and here) about the hgdp samples and the fact that there is very little to no provenance info connected to them. the problem with this, afaics, is that it’s difficult to know whether or not the hgdp samples are truly representative, in all ways, of the populations from which they came.

i was particularly concerned initially about the french (and the japanese) hgdp samples — and then i got over that — but now i’m concerned about them again. here’s why:

the hgdp samples from france are described thusly:

“France – French/various regions (relatives) – This sample from various regions of France is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.”

great!

hang on — which regions?

auvergne? where, in some villages in the eighteenth century, groups of families regularly inbred with one another? lorraine? which, in some areas, had consanguinity rates of up to 50% between 1810 and 1910? burgundy or brittany, both of which had reportedly higher cousin marriage rates in the nineteenth and twentieth centuries than other regions of france? or were the hgdp samples collected in places like central france which, historically, had much lower rates (in the range of 1-3.5%) of close marriages?

the thing is: we don’t know.

what we do know is that the hgdp sampling seems kinda biased towards unique little groups like basques and orcadians, sardinians and the adygei. which is understandable ’cause these are all interesting, unusual groups and there’s legitimate concern that their unique genomes might sorta disappear in our modern, outbreeding world, and it would be a shame to miss out on the chance to at least keep a record of all that human biodiversity.

but then i have to wonder how representative of the majority of french people are the french hgdp samples? do they truly represent “the french,” or did the samples come from some of those crazy little villages way up in the mountains? i dunno. and neither does anybody else (afaik).

and the reason i wonder is: if teh scientists are gonna do really awesome genetic studies to check for the relatedness between the members of different human populations — like runs of homozygosity (roh) studies or identity by descent (ibd) studies — i think they need to know if the samples they’re looking at are representative or not. do the results for “the french” in studies like this or this or this truly represent the average french, or do they represent some special sub-groups of mountain dwelling french?

in the most recent roh study i posted about, the “french” don’t appear to be much more in- or out-bred than orcadians or the basques, something which strikes me as odd. perhaps — perhaps — that’s because the french hgdp samples are not truly representative of the broader french population. perhaps. i don’t know. nor do the researchers.

rinse and repeat above discussion for the other samples, too.

previously: hgdp samples and relatedness and more on the hgdp samples and why i care about the hgdp samples and meanwhile, in france… and runs of homozygosity and inbreeding (and outbreeding) and ibd and historic mating patterns in europe and runs of homozygosity again

(note: comments do not require an email. not out on a limb, am i?)

Advertisements

runs of homozygosity again

**update below**

here’s an exciting new paper!: Genomic Patterns of Homozygosity in Worldwide Human Populations. i don’t have access to the paper itself, but there are lots o’ neat figures and tables in the supplemental data [opens pdf] that relate to runs of homozygosity (roh). roh are identical stretches of dna within an individual’s genome (i.e. identical on each of the dna strands, paternally and maternally inherited). (roh shouldn’t be confused with blocks of identity by descent [ibd], which i did once! ibd blocks are identical stretches of dna as compared between different individuals, iiuc.)

recall that possessing lots of long roh indicates that one’s parents are/were quite similiar genetically speaking. that can be as a result of a couple of different genetic scenarios like (as greying wanderer has brought up a lot recently) simply being from a small sized population (i.e. having a small effective population size) and/or from regular inbreeding (consanguineous/endogamous mating). so, a population having a lot of long roh is either small and/or inbreeds a lot. populations having LOTS of short roh have probably been through some sort of bottleneck (see previous post).

in the paper i looked at in that previous post, the researchers had looked at the different roh lengths for large, regional populations like “europeans” or “east asians.” amongst other things, they had found that some of my regular inbreeders — the fbd marriage folks — had some of the highest numbers of medium and long roh, a state of genetic affairs which likely reflects their long-term close mating patterns. interestingly, the researchers had found that east asians had roh lengths similar to those of europeans across the board, something which surprised me since, at least according to what i’ve been reading, east asians (i.e. the chinese) have been inbreeding for a much longer time than europeans. one drawback of that previous study, though, was that, apart from the french, most of the european populations they looked at were peripheral groups who have had a tendency to inbreed more than my “core” europeans (see mating patterns in europe series below ↓ in left-hand column).

the new paper suffers from some of the same problems since the data come from the same sources (hgdp-ceph and hapmap phase 3 populations), so northern europeans — apart from the french — aren’t included in this paper either. (what can you do? it’s early days yet. i look forward to when there’s lots more genetic data available out there for teh scientists to work with! (^_^) )

what the researchers in this paper have done, though, is to look at both the different mean lengths of roh in each of the different populations sampled AND they looked at total numbers of roh within individuals for each population. this has, i think, drawn out some interesting differences between the populations.

first, here are two graphics from the supplmental data (linked to above). click on each for LARGER views (they should open in new tabs/windows — you might have to click on them again there to super-size them).

i’ve highlighted a handful of populations i want to focus on ’cause i know a little something about their historic mating patterns: the bedouin (as a proxy for the arabs — note that the bedouin have probably inbred more than more settled arabs); italians (not sure if they’re northern or southern italians or a mix of both — however, there are tuscans in the samples with which these “italians” can be compared); pathan or pastuns (more fbd marriage folks, like the bedouins/arabs); and han chinese (there are some northern han chinese with whom this groups can be compared). ok. here are the charts:

as you can see, the researchers have split up the roh into three classes (note that the short and medium classes here are a lot shorter than those in the paper looked at previously):

– A: 0.25-0.40 Mb (short)
– B: 0.6-1.2 Mb (medium)
– C: 0-35 Mb (long)

the interesting thing in the first chart above (Fig. S3 – Mean ROH Length for Each of the Three Size Classes in Each Population), is that the han chinese have lower means of roh length in all of the size classes compared to the other populations i’ve highlighted. in the previous study, the researchers found that east asians had similar means to europeans for all roh lengths. i found this surprising since, from what i’ve read, the han chinese have been inbreeding for a longer period of time than europeans. what might be confounding the results though, once again, is the fact that nw europeans (the outbreeders extraordinaire) are not really included in either of these studies apart from a handful of french samples.

in this latest study, both the bedouin and the pashtun, for instance, have higher means — and wider spreads — of long (class C) roh than the italians, which is what i would’ve expected since those two groups (the bedouins and the pashtuns) are, being fbd marriage folks, serious inbreeders. perhaps the reason the han chinese long roh mean is comparatively low is partly due to the fact that they historically practiced mother’s brother’s daughter (mbd) marriage which doesn’t push towards such close inbreeding as fbd marriage. still, i would’ve expected to see greater means of roh for the chinese than the italians — or, at least, around the same. not so much lower. (unless the italians practiced fbd marriage, too — or fzd marriage — but i don’t think so.)

if you look at the second chart (Fig. S4 – Total Number of ROH in Individual Genomes), however, you’ll see that, overall, the han chinese have more short, medium and long roh totally in individual genomes than any of the other three populations i’ve highlighted. both the bedouins and the pashtuns have greater numbers/wider total spread of long roh than the italians, but the han chinese have a much greater total number of long roh than any of the other three groups — three or four times as many.

but they’re, on average, shorter long roh don’t forget. (confusing, eh?!)

perhaps this is what you get when you have — as the chinese have had — a pretty good-sized effective population size for such a long time. there have been a LOT of han chinese for — wow — millennia.

so, it looks like this (in this order of inbrededness — i think):

– bedouins: highest mean, and very wide spread, of long roh; high total numbers, and widest spread, of long roh.
– pashtun: low mean, but widest spread, of long roh; low total number, but very wide spread, of long roh.
– han chinese: very low mean, and very narrow spread, of long roh; highest total numbers, and wide spread, of long roh.
– italians: low mean, and rather wide spread, of long roh; very low total number, and very small spread, of long roh.

other interesting points are that:

– the tuscans/tsi (toscani) appear to have lower short, medium and long mean roh than the generic “italian” category. however, the tuscans have lower total numbers of long roh than the “italians” while the toscani (tsi), on the other hand, appear to have a greater total number of long roh than the “italians.” while the tuscan samples and the toscani/tsi samples are from different studies (hgdp vs. hapmap), they are all supposed to be from tuscany, so it’s surprising that they’re so different. perhaps the individuals in the toscani/tsi sample were more closely related somehow?

– the northern han samples have lower short, medium and long mean roh than the generic “han” category. this would fit my general impression that historically inbreeding has been greater in southern china than in the north. however, the total number of long roh are greater in the northern han sample than in the “han” sample. not sure what that means.

don’t forget that there can be all sorts of reasons for differences in roh: inbreeding vs. outbreeding, yes, but also effective population size, population movement (migration in or out), bottlenecks, etc. i just happen to be interested in trying to pick out the effects of inbreeding/outbreeding — if possible.
_____

**update – here are a couple of excerpts from the article (thnx, b.b.!) [pgs. 277, 279-281]:

“Size Classification of ROH

“Separately in each population, we modeled the distribution of ROH lengths as a mixture of three Gaussian distributions that we interpreted as representing three ROH classes: (A) short ROH measuring tens of kb that probably reflect homozygosity for ancient haplotypes that contribute to local LD [linkage disequilibrium] patterns, (B) intermediate ROH measuring hundreds of kb to several Mb that probably result from background relatedness owing to limited population size, and (c) long ROH measuring multiple Mb that probably result from recent parental relatedness….

“In each population, the size distribution of ROH appears to contain multiple components (Figure 2A). Using a three-component Gaussian mixture model, we classified ROH in each population into three size classes (Figure 2B): short (class A), intermediate (class B), and long (class C). Size boundaries between different classes vary across populations (Table S1); however, considering all populations, all A-B boundaries are strictly smaller than all B-C boundaries (Figure 2C). The mean sizes of class A and B ROH are similar among populations from the same geographic region (Figure S3), with the exception that Africa and East Asia have greater variability. The class C mean is generally largest in the Middle East, Central/South Asia, and the Americas and smallest in East Asia (Figure S3), with the exception that the Tujia population has the largest values. In the admixed Mexican population (MXL), mean ROH sizes are similar to those in European populations. In the admixted African American population (ASW), however, mean ROH sizes are among the smallest in our data set, notably smaller than in most Africans and Europeans.

“Geographic Pattern of ROH

Several patterns emerge from a comparison of the per-individual total lengths of ROH across populations (Figure 3). First, the total lengths of class A (Figure 3A) and class B (Figure 3B) ROH generally increase with distance from Africa, rising in a stepwise fashion in successive continental groups. This trend is similar to the observed reduction in haplotype diversity with increasing distance from Africa. Second, total lengths of class C ROH (Figure 3C) do not show the stepwise increase. Instead, they are higher and more variable in most populations from the Middle East, Central/South Asia, Oceania, and the Americas than in most populations from Africa, Europe, and East Asia. This pattern suggests that a larger fraction of individuals from the Middle East, Central/South Asia, Oceanis, and the Americas tend to have higher levels of parental relatedness, in accordance with demographic estimates of high levels of consanguineous marriage particularly in populations from the Middle East and central/South Asia, and it is similar to that observed for inbreeding-coefficient and identity-by-descent estimates. Third, in the admixed ASW and MXL individuals, total lengths of ROH in each size class are similar to those observed in populations from Africa and Europe, respectively (Figure 3).

“The total numbers of ROH per individual (Figure S4) show similar patterns to those observed for total lengths (Figure 3). However, in East Asian populations, total numbers of class B and class C ROH per individual are notably more variable across populations than are ROH total lengths.”

previously: runs of homozygosity and inbreeding (and outbreeding) and ibd and historic mating patterns in europe

(note: comments do not require an email. ribbit!)

linkfest – 08/19/12

The genetic history of Europeans – from dienekes.

The DNA Olympics — Jamaicans Win Sprinting ‘Genetic Lottery’ — and Why We Should All Care – in which jon entine sneaks the phrase “human biodiveristy” into forbes online. (^_^)

Neandertal ancestry “Iced”“The genome of this Neolithic-era individual [Ötzi] shows a substantially higher degree of Neandertal ancestry than living Europeans.” – from john hawks.

Analysis Of China’s PISA 2009 Results – anatoly concludes that china’s pisa-derived iq is ≈ 102.5.

What predicts college grades better than IQ score?“At the university level, introversion predicts academic performance better than cognitive ability.” – from barking up the wrong tree, via foseti.

Parents also choose“[I]n many if not most societies, men, i.e. fathers, decide which man is allowed to mate with his daughter or other female relative.” – @the breviary.

A GPS in Your DNA“Using a probabilistic model of genetic traits for every coordinate on the globe, the researchers have developed a method for determining more precisely the geographical location of a person’s ancestral origins.”

Early birds have the best temperament profile – but night owls have higher iqs. (~_^) – from the inductivist.

Fertile Gals Have All the Right Dance Moves“Women in the fertile phase of their menstrual cycle are judged as more attractive dancers by men than are women in a less-fertile phase, a new study finds.”

Genetically engineering ‘ethical’ babies is a moral obligation, says Oxford professor“‘Indeed, when it comes to screening out personality flaws, such as potential alcoholism, psychopathy and disposition to violence, you could argue that people have a moral obligation to select ethically better children.'” – hmmm. who’s going to go first? mightn’t their kids be at a disadvantage in some situations? see also: The wrongs, and rights, of genetic screening for children.

Sporting aggression more common in opponents of a similar ability than in contests between unequal teams“The same is also true of aggressive contests between individuals in the animal kingdom, whether it is rutting deer stags or quarrelsome Siamese fighting fish. Now scientists believe they have found evidence for the same trait in competing groups of sportsmen.”

bonus: ‘Who’s Your Daddy’ Truck Rolls Through NYC, Offers Answers With DNA Tests

bonus bonus: Gorillas certainly show emotion – but what do they feel?

bonus bonus bonus: What you don’t know can hurt you“In 1972, the U.S. passed the Clean Water Act, despite a presidential veto by Richard Nixon. Did this act also end an era of unusually high estrogen levels in the environment?” – from peter frost.

(note: comments do not require an email. boo!)

eugenics in the news

from the u.k.’s telegraph (links added by me):

“Euroscience Open Forum 2012: DNA gene testing ‘will screen out lovers'”
13 Jul 2012

“Couples will soon be able to choose their life partner solely based on the compatibility of their genes instead of through love, a scientific conference has heard.

“Due to the falling cost of DNA testing Britain is on the cusp of a new era of eugenics, according to a leading British scientist.

“Prof Armand Leroi, of Imperial College London, said that within five to ten years it will be common for young people to pay to access their entire genetic code.

“He told the Euroscience Open Forum 2012, in Dublin, that a desire to have a healthy baby will lead more to request access to the view the genes of any prospective partner.

“Armed with this information, the couple could then use IVF to screen babies with incurable diseases.

“While it was unlikely people will have the ‘luxury’ of using the technology to design babies, by their intellect or eye colour, they would instead focus on stopping genetic diseases.

“Addressing a session titled ‘I human: are new scientific discoveries challenging our identity as a species’, he said the cost of genetic sequencing was falling so quickly that ‘it is going to become very, very accessible, very, very soon’….

“He said eugenics were already available, with tens of thousands of unborn babies with Down’s syndrome and other illnesses being aborted every year.

“He told the conference on Thursday: ‘These processes are very well established in most European countries.

“‘Many of the ethical problems that people raise when they speak of neoeugenics are nought once you offer gene selection or mate selection as a eugenic tool.'”
_____

meanwhile, in tonga:

“Tonga’s Crown Prince Tupouto’a Ulukalala marries cousin”
12 July 2012

“The heir to the throne of Tonga in the South Pacific has married his second cousin in the capital Nuku’alofa.

“Crown Prince Tupouto’a Ulukalala and his bride, Sinaitakala Fakafanua, both in their 20s, waved to cheering crowds as they left church after the wedding…..

“Marriage between cousins is seen as a way of keeping the royal bloodline strong in Tonga….”

felicitations to the happy couple! (^_^) (seriously!)

previously: ivy league selective breeding

(note: comments do not require an email. tonga – the friendly islands?)

not so many luddites after all

over the weekend, one of my younger first cousins once removed declared that he absolutely, definitely will be a fireman when he grows up (he’s three). his father, one of my in-laws, (half-)jokingly said that, no, you’ll be a lawyer or a doctor or a professor. i chimed in with: “get into genetics, kid. that’s where all the money will be.” (heh! as if i would know.)

my cousin-in-law responded: “genetics? but that’s unethical.” this from a man with a marketing degree. (~_^)

i have to admit i was pretty flummoxed and didn’t really know how to respond or even where to start. our follow up discussion was brief so i didn’t get a satisfactory explanation as to what’s “unethical” about “genetics,” but i got to wondering what the rest of america thinks. thankfully, they’re not so skeptical:

Survey finds wide public support for nationwide study of genes, environment and lifestyle
Nov 12, 2008

Four in five Americans support the idea of a nationwide study to investigate the interactions of genes, environment and lifestyle, and three in five say they would be willing to take part in such a study, according to a survey released today. The research was conducted by the Genetics & Public Policy Center at Johns Hopkins University with funding from the National Human Genome Research Institute (NHGRI) of the National Institutes of Health (NIH)….

“Our survey found that widespread support exists in the general public for a large, genetic cohort study. What’s more, we found little variation in that support among different demographic groups,” said David Kaufman, Ph.D., lead author of the paper and project director at the Genetics & Public Policy Center, which is located in Washington….

The online survey of 4,659 U.S. adults was conducted between December 2007 and January 2008. When asked about their support for and willingness to participate in a large genetic cohort study, 84 percent of respondents supported the study and 60 percent indicated they would definitely or probably participate in such a study if asked.

Survey respondents were carefully selected to reflect the demographic makeup of the United States. No significant differences in support or willingness to participate were observed between whites, Hispanics, African Americans and Asians. American Indian and Alaska Native respondents expressed less support for the study (65 percent), but were just as likely to be willing to participate (63 percent) as other respondents….

the pew folks also conducted a “town hall meeting” about genetics in 2008 — a set of five focus group sessions held around the country. from the report [pg. 11]:

“Participants were asked to consider what types of research should and should not be done with the information collected by the proposed study. Research aimed at curing disease was commonly cited as acceptable, and some participants named conditions such as cancer, birth defects, and diabetes….

“Human cloning was cited in every town hall as an unacceptable use of the proposed biobank, although in one case participants differentiated between reproductive cloning (unacceptable) and cloning aimed at regenerating organs or otherwise curing disease (acceptable). Participants frequently named research aimed at altering humans or creating ‘designer babies’ as unacceptable. Another area of concern was ‘things that point out differences between gender, or race, or anything like that that people use to discriminate.’ Other areas mentioned included weapons development, intelligence, alcoholism, and sexual orientation….”

so a lot of americans don’t like the idea of cloning. personally, i’m looking forward to being able to clone myself. i mean, how great a world would it be with more MEs in it? (~_^) and why should bacteria and some lizards have all the fun anyways?

and a lot of americans don’t like “designer babies” either. the funny thing is, of course, that they don’t realize that that’s what they’re aiming for when they look for that perfect someone to marry, i.e. kids to match their heart’s desire. in fact, a lot of americans don’t like anything that smacks of eugenics. i guess that’s not too surprising at this point in time.

at least the majority haven’t written off the whole discipline of genetics as “unethical” though.

(note: comments do not require an email. clone.)

linkfest – 04/22/12

Why Kenyans Make Such Great Runners: A Story of Genes and Cultures – somebody’s picked up on the phrase human biodiversity! check out the last sentence in paragraph two. (^_^) h/t to luke for pointing out the article to me!

Why Chimpanzees Kill“[K]ills occurred in most of the chimpanzee communities and that victims tended to be infant and adult males outside the killer’s social group. Most of the killings were conducted by groups of males…. What did appear to be a factor was the number of males in a group: the higher the number of males in a group, the higher the number of kills.”

People prefer male politicians with lower voices – @the inductivist.

Analysis of surname origins identifies genetic admixture events undetectable from genealogical records – @race/history/evolution notes.

Impact of Carnivory on Human Development and Evolution Revealed by a New Unifying Model of Weaning in Mammals“Since early weaning yields shorter interbirth intervals and higher rates of reproduction, with profound effects on population dynamics, our findings highlight the emergence of carnivory as a process fundamentally determining human evolution.”

Women focus on their children, not their men, as they age – of course. @dennis’.

Women Are Twice As Likely To Hit The Gas By Mistake – that’d be pretty funny if it weren’t so dangerous.

bonus: When Memory Commits an Injustice“[W]hen it comes to human memory, more deliberation is often dangerous.”

bonus bonus: Evolution seen in ‘synthetic DNA’“Researchers have succeeded in mimicking the chemistry of life in synthetic versions of DNA and RNA molecules. The work shows that DNA and its chemical cousin RNA are not unique in their ability to encode information and to pass it on through heredity.”

bonus bonus bonus: The Emergence and Early Evolution of Biological Carbon-Fixation“Here we reconstruct the complete early evolutionary history of biological carbon-fixation, relating all modern pathways to a single ancestral form.”

(note: comments do not require an email. orwell court. heh.)

why i care about the hgdp samples

if the hgdp samples are going to be used to look at the degrees of kinship within populations — which would be awesome! and which prof. harpending started to do recently — then care has to be taken to identify which sets of samples include lots of relatives.

if you’re gonna analyze a bunch of individuals’ genomes to ascertain the degree of kinship between them in order to determine the degree of kinship within their broader population, then you want to make sure you’ve got a random, representative sample from the population and not a bunch of relatives since, of course, a bunch of relatives will naturally have a high degree of kinship.

and if you found a high degree of kinship in a set of samples that included a bunch of relatives and didn’t know you were looking at a set of relatives, you might conclude that there must be a high degree of kinship across the broader population, too, but this might not be the case at all.

for example, take the hgdp samples from the pashtun and the kalash in pakistan. twenty-five genomes are available from each group, but according to rosenberg (see previous post), none of the individuals in the pashtun group were relatives whereas in the kalash group there likely are: one parent-offspring pair, one half-sibling pair (or an equivalent), and four pairs of cousins.

let’s say, then, that it was found that these two sets of samples — the pashtun and the kalash — had exactly the same degree of internal kinship between their members, genetically speaking. that would mean that the members of the broader pashtun population had the same kinship to one another as the several sets of relatives in the kalash population, since the pashtun genomes were random samples whereas the kalash ones were not. it might look like the two groups had the same, population-wide degree of kinship, but in reality that would not be what we had found.

of the 52 population samples in the hgdp (the south african bantus are counted as one group … even though they come from different ethnic groups … hmmmm …), rosenberg found that exactly half (26) include relatives.

the other problem i have with the hgdp samples involves the ones that were collected from immigrant groups here in the u.s.:

– han chinese: “This is a sample of Han Chinese living in the San Francisco, California.”
– japanese: “Collected by L.L. Cavalli-Sforza from Japanese-born individuals living in the San Francisco Bay area, and by K.K. Kidd and J. R. Kidd from Japanese-born individuals living in Connecticut.”
– cambodians: “Collected by K. Dumars from individuals born in Cambodia who are now living in Santa Ana, California.”

are these immigrants really representative of their native populations? are they first- or second- or fourth-generation americans? some immigrant groups start to outbreed in a new land, but others do just the opposite. what’s the case with these groups? how old were the individuals sampled (since in many populations inbreeding rates have gone down in the last 50 years or so)? do they all come from the same region in their native country (like guangdong province), or from all over? to give you an idea of some of the possible problems involved with these sets of samples, have a look at what i said about the japanese samples in the previous post.

and i still have a bug about which regions of france the french samples came from (see previous post). (~_^) it probably doesn’t matter that much, but it would’ve been nice to know how widespread the sampling was, i.e. how random and representative of the entire population of france are these samples? historically, different regions of france have had different inbreeding rates as can be seen in this map of the inbreeding coefficients for france, 1926-1945 [pg. 620] (my guess is this pattern goes back a long way, too — probably to the early medieval period and the introduction of manorialism in continental europe):

so, which regions the samples were drawn from in france might actually make a difference — especially depending upon how deeply one drills down into the question of relatedness. i wonder the same thing about many of the other samples, too, for instance the ones from russia, but we know that all those samples came from the vologda administrative region so we are at least aware that they may not be representive of all ethnic russians.

despite all these potential difficulties, i look forward to more genetic research into kinship and relatedness within populations — from prof. harpending or whomever! very cool stuff! (^_^)

previously: hgdp samples and relatedness and more on the hgdp samples

(note: comments do not require an email. just skip the email!)

more on the hgdp samples

first, see my previous post on this if you want to follow along.

in that post, i expressed some concerns over the french human genome diversity project (hgdp) samples since the ceph folks describe them as: French (various regions) relatives. i wondered both of the following: 1) how many and which “various regions,” since different regions of france have historically had different rates of inbreeding — haven’t managed to find out which “various regions” — and 2) how many and what sorts of relatives? i did find out that.

via some genetic wizardry, a noah rosenberg tried to work out if any of the individuals in any the hgdp samples were, in fact, relatives [see here]. to cut a long story short, rosenberg found it likely that two individuals in the french sample were siblings [see pg. 7 here – opens pdf], thus the “relatives” indicator on the ceph website. so, the entire french sample is NOT full of family members like i wondered in my last post — only two of the individuals sampled are likely to have been relatives.

i still think it would be useful to know from which regions the samples were drawn, but i guess i just have to live with not knowing for the meantime. (~_^) but now i feel more secure about professor harpending’s conclusion — that regarding the french: “from the viewpoint of kinship, one person is not very different from another person.”

however, now i feel unsure about the japanese samples! the hgdp samples for the japanese are described on ALFRED as:

“Collected by L.L. Cavalli-Sforza from Japanese-born individuals living in the San Francisco Bay area, and by K.K. Kidd and J. R. Kidd from Japanese-born individuals living in Connecticut.”

ack! well, how representative of japanese people in japan are these people? where did they come from? urban areas? rural areas? different areas? mostly the same areas? how old were they?

i ask all these questions because, historically, urban japanese have had lower inbreeding rates than rural japanese … and the inbreeding rates overall for japan dropped pretty sharply after wwii [see pgs. 4-5 here – opens pdf]. so if the samples include mostly young, urban japanese who recently moved to the u.s., well i wouldn’t be surprised if they look quite outbred. but if the samples include mostly older, rural japanese, i would be surprised if they looked outbred.

now i don’t have any confidence in the japanese hgdp samples — not for looking at kinship within the japanese population anyway. btw, rosenberg didn’t find any likely relatives in the japanese samples.
_____

i went through the ceph table of the hgdp samples and ALFRED and compiled a list of all the hgdp samples and if they 1) likely include any family members (“relatives” – based on rosenberg), and 2) where the samples were collected and from whom, if known. many of the samples don’t have any useful information on their provenance. for example, many of the ALFRED entries say that the samples were drawn from unrelated individuals, but rosenberg found that they, in fact, likely included relatives.

why do i care about any of this? i’ll explain that in another post. right now … coffee! (^_^)

**update: see why i care about the hgdp samples**
_____

the list:

– Central African Republic – Biaka Pygmy (relatives)
This sample is comprised of Biaka, living in the village of Bagandu, in the southwest corner of the Central African Republic (3.42N; 18E altitude approximately 500m). This group is probably an admixture of 3/4 “non-pygmy” African ancestry and 1/4 Mbuti ancestry. The transformed cell lines were established by Judith R. Kidd. The sources of this sample are L. Cavalli-Sforza (Stanford University) and K.K. Kidd, J.R. Kidd (Yale University).

– Democratic Rep of Congo – Mbuti Pygmy (relatives)
The sample is composed of Nilosaharan and Niger Kordofanian speaking Mbuti pygmies from the northeastern part of the Ituri Forest (northeastern Democratic Republic of the Congo). It was collected by L.L. Cavalli-Sforza in 1986.

– Senegal – Mandenka (relatives)
This sample from the Central African Republic is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– Nigeria – Yoruba (relatives)
Most of the Yoruba individuals in this sample are urban health care workers from Benin City, Nigeria, collected by Prof. Friday E. Okonofua and collaborators; cell lines established by Dr. J.R. Kidd.

– Namibia – San (relatives)
This sample from Namibia is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– Kenya – Bantu NE (relatives)
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– S. Africa – Bantu SE Pedi
– S. Africa – Bantu SE Sotho
– S. Africa – Bantu SE Tswana
– S. Africa – Bantu SE Zulu
– S. Africa – Bantu SW Herero
– S. Africa – Bantu SW Ovambo

These samples are part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). They include the following individuals: #993, 994, 1028, 1030, 1031, 1033, 1034, and 1035. These samples consist of unrelated Bantu speakers from southern Africa and were collected with proper informed consent.

– Algeria – Mozabite (relatives)
This sample from Algeria is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– Israel (Negev) – Bedouin (relatives)
This sample from the Negev region of Israel is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– Israel (Carmel) – Druze (relatives)
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). The Druze, a Moslem community from Northern Israel. Collected by B. Bonne-Tamir (Tel Aviv University) as part of the repository of samples of Israeli populations. This sample contains both related and unrelated individuals.

– Israel (Central) – Palestinian (relatives)
This sample from the central region of Israel is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– Pakistan – Brahui
– Pakistan – Balochi (relatives)
– Pakistan – Hazara (relatives)
– Pakistan – Sindhi (relatives)
– Pakistan – Pathan
– Pakistan – Kalash (relatives)
– Pakistan – Burusho

These samples from Pakistan are part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). These samples consist of unrelated individuals and were collected with proper informed consent.

– Pakistan – Makrani
*no info found.*

– China – Han
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This is a sample of Han Chinese living in the San Francisco, California. Collected by L. Cavalli-Sforza (Stanford University), K.K. Kidd, and J.R. Kidd.

– China – Tujia
– China – Yizu/Yi
– China – Miaozu/Miao
– China – Oroqen (relatives)
– China – Daur
– China – Mongola
– China – Hezhen
– China – Xibo
– China – Uygur
– China – Dai
– China – She
– China – Lahu (relatives)
– China – Naxi (relatives)
– China – Tu

These samples from China are part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). These samples consist of unrelated individuals and were collected with proper informed consent.

– Siberia – Yakut
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). Yakut-speaking individuals in the Yakut Autonomous Republic. Individuals sampled were living or were born along the river Lena in the area of Yakutsk and northward, roughly 129-130E, 62-64N. This sample was collected by E.L. Grigorenko.

– Japan – Japanese
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). Collected by L.L. Cavalli-Sforza from Japanese-born individuals living in the San Francisco Bay area, and by K.K. Kidd and J. R. Kidd from Japanese-born individuals living in Connecticut.

– Cambodia – Cambodian (relatives)
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). Collected by K. Dumars from individuals born in Cambodia who are now living in Santa Ana, California.

– France – French/various regions (relatives)
This sample form various regions of France is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– France – Basque
This sample from France is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– Italy – Sardinian
This sample from Italy is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– Italy – from Bergamo
– Italy – Tuscany

*no info found.*

– Orkney Islands – Orcadian (relatives)
This sample from the Orkney Islands is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– Russia Caucasus – Adygei
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). Adygei-speaking people near Krasnodar in the Russian republic of Adygei, which is in the southeastern section of the country (north of the Caucuses mountains). They are culturally and linguistically distinct from neighboring Russians. This sample was collected by E. Grigorenko (Yale University) V. Galkina, and M. Kadoshnikova (Bristol company, Russia).

– Russia – Russian
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). Sample collected by E. Grigorenko from rural communities of ethnic Russians living in the Vologda Administrative Region, about 400 km north of Moscow, roughly 59-61N, 39-41E.

– Mexico – Pima (relatives)
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). Collected from Pima living near the eastern border of the state of Sonora, Mexico. Collected by L.O. Shulz.

– Mexico – Maya (relatives)
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of Mayans who are Yucatec speakers from in the Xmaben village located in the Mexican state of Campeche in the central Yucatan peninsula. Blood and serum markers indicate European admixture to be about 10 % (K. Weiss, personal communication). Some evidence suggests that the area from which this sample was drawn served as a refuge for Maya people from across southern Mexico who fled to this more remote region during a series of revolts against the Spanish in the 19th and early 20th centuries. There are 53 transformed cell lines (106 chromosomes) established by Judith R. Kidd. The sources of this sample are K.K. Kidd and J.R. Kidd (Yale University).

– Colombia – Piapoco and Curripaco (relatives)
*no info found.*

– Brazil – Karitiana (relatives)
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). The sample was collected in the Karitiana village (Rondonia Province, Brazil) by F. Black. HLA haplotypes indicate that the Karitiana have no non-Amerindian admixture and are genetically distinct from other sampled populations in relative geographical proximity, such as the Surui.

– Brazil – Burui (relatives)
*no info found.*

previously: hgdp samples and relatedness

(note: comments do not require an email. remember: you’re better off just skipping it!)