anglo-saxon burials and genetics in england

here’s a map (on the left) of anglo-saxon burial sites of the 5th to 7th centuries from “Anglo-Saxon immigration and ethnogenesis” compared to the distribution of the eastern, central, and southern english genetic cluster (red squares on map to right) from leslie et al. who found between 10-40% of the ancestry of those english to be anglo-saxon:

harcke - anglo-saxon burial sites 5th to 7th-8th centuries

that is all! (^_^)

previously: free cornwall now!

(note: comments do not require an email. anglo-saxon burial: lady and her cow.)

free cornwall now!

the long-awaited genetic ancestry mapping of the u.k. by the wellcome trust has finally been completed (hurrah!) — it’s very, very cool! — and it confirms what everyone has always known: the cornish are different! (~_^)

from nature news: UK mapped out by genetic ancestry“A map of the United Kingdom shows how individuals cluster based on their genetics, with a striking relationship to the geography of the country”:

u.k. genetic ancestry mapping

as you can see, all the calls for cornish independence have been justified! the good folks of cornwall are their own little genetic subpopulation, even distinct from their neighbors in devon (as they’ve known all along). so there! =P

to sum up the major findings:

– the welsh appear to be genetically quite different from the rest of the subpopulations in britain, and so the authors reckon they are the most like the earliest hunter-gatherers who migrated to britain at the end of the last ice age.

– the analyses suggest that there was a substantial migration across the channel after the original post-ice-age settlers but before roman times. white british people today have thirty percent (30%) of their dna ancestry from germanic populations, and people in southern and central england share 40% of their dna with the french (again, this relatedness is pre-norman). there’s also substantial relatedness to danes and belgians due to these early migrations. these migrations had little impact in wales.

– there wasn’t a single “celtic” genetic group in britain before the later invasions of the anglo-saxons, etc. the scots, northern irish, welsh, and cornish are some of the most different from each other genetically. the cornish (free cornwall!) are more similar genetically to other english groups than they are to the welsh, for instance.

– the english in eastern, central, and southern england (all those red squares) are pretty much one, relatively homogeneous, genetic group having significant genetic contributions — between 10-40% of their total ancestry — from the anglo-saxons. this strongly indicates that the invading anglo-saxons intermarried with the existing populations and did not replace them 100%.

– fantastically, the danish vikings (of the danelaw of the ninth century) do NOT appear to have left much dna behind at all. their numbers must’ve been small and/or most of them left (or were killed) at some point.

– the cornish (free cornwall!) and devonians are distinct genetic subgroups, and the division between the two groups lies pretty much at the boundaries between the two counties.

– the subpopulation of west yorkshire look like they’re the descendants of the people of elmet (the last of the brittonic kingdoms to hold out against the anglo-saxons)!

– the cumbrians and the northumbrians are distinct from each other, the people of west yorkshire, and the rest of the english.

– yes, the english-speaking population of pembrokeshire is genetically distinct from the rest of the welsh.

– the orkney islanders are the most genetically distinct of all the subgroups having 25% norwegian dna. again, though, the viking invaders mated with the locals and didn’t replace them 100%.

dál riata is apparent on the map there, as are the lowland scots and border reievers contributions to the ulster scots population.

from the telegraph:

“Geneticist Professor Sir Walter Bodmer of Oxford University said: ‘What it shows is the extraordinary stability of the British population. Britain hasn’t changed much since 600AD.

“‘When we plotted the genetics on a map we got this fantastic parallel between areas and genetic similarity.

“‘It was an extraordinary result, one which was much more than I expected. We see areas like Devon and Cornwall where the difference lies directly on the boundary.’

“Professor Mark Robinson, of Oxford University’s department of archaeology added: ‘The genetic make-up we see is really one of perhaps 1400 years ago.'”
_____

for the purposes of this blog, one of the most interesting things is that lack of a danish viking genetic legacy in england. one of the things we’ve been puzzling about here is where on earth the puritans came from, and one of the ideas that has been bandied about has been that perhaps they were the descendants of the danes, since the danish vikings controlled east anglia and that’s where the purtians were from. that idea doesn’t seem to hold water anymore.

(there’s something else in the paper that may or may not, kinda-sorta be of interest regarding the general topic of this blog, but i’m going to address that in a separate post.)

speaking of the puritans and albion’s seed (and american nations), jayman’s already tweeted this!:

(^_^) so there you go.
_____

i think that’s everything for now. there’s a LOT to take in from this research. i look forward to what razib and greg cochran will have to say on the paper.

for now, for more info, have a look at these!:

UK mapped out by genetic ancestry: “Finest-scale DNA survey of any country reveals historical migrations.”
– the original research article (behind a stupid paywall): The fine-scale genetic structure of the British population. the supplementary information file [pdf] looks like it’s a good read.
Britons still live in Anglo-Saxon tribal kingdoms, Oxford University finds: “A new genetic map of Britain shows that there has been little movement between areas of Britain which were former tribal kingoms in Anglo-Saxon England.”
Genetic study reveals 30% of white British DNA has German ancestry: “Analysis over 20 years reveals heavy Anglo-Saxon influence, with French and Danish DNA coming from earlier migrations than the Normans or Vikings.”
Study Reveals Genetic Path of Modern Britons: “Researchers found 17 clusters, based on genetic relatedness, in the modern British population.”
Scientists discover genetic “border” between Devon and Cornwall
– from dienekes: British origins (Leslie et al. 2015)

(note: comments do not require an email. free cornwall now!)

different mutation rates in different human populations

well this seems important! via race/history/evolution notes, an abstract from the society for molecular biology and evolution 2014 conference (in puerto rico! – teh scientists are always good to themselves whenever they can be (~_^) ):

Evidence for different mutation rates across human populations
Ron Do, David Reich
Department of Genetics, Harvard Medical School, Boston, USA

Although mutation rates (per base pair) have clearly changed across primate evolution, many analyses continue to assume that all present-day human populations have the same mutation rates. Recently, William Amos analyzed 1000 Genomes Project and Complete Genomics sequences and found evidence of significantly higher divergence rates on African than on non-African lineages since separation (W. Amos, PLoS One 4, e63048). The detected pattern was strongest in genomic regions of high polymorphism rate, a pattern that the author hypothesized was due to ‘heterozygote instability’, whereby gene conversion events surrounding heterozygous sites increase the mutation rate. To further test this observation, we measured the relative accumulation of mutations in lineages drawn from two different populations, using 25 deep genome sequences generated according to the same experimental protocol using the Illumina technology. We carried out pairwise comparisons of five sub-Saharan African (Dinka, Mandenka, Mbuti, San, Yoruba) and eight Non-African populations (Australian, Dai, French, Han, Karitiana, Mixe, Papuan, Sardinian) on all divergent sites. We observed statistically significant differences in the relative accumulation of mutations for many pairs of African and Non-African populations. Among the strongest differences is significantly more lineage-specific mutations in Mbuti than in Han Chinese (R=1.044, standard error (SE) =0.0015). On average, we observed about 1% more mutations on African lineages compared to Non-African lineages. We also observed some significant differences across non-African populations, with the Han Chinese who have experienced extreme expansions in population size associated with agriculture having more mutations than the Karitiana, a hunter-gatherer population from Amazonia who did not experience such expansions (R=1.015, SE=0.0014). The results are consistent across both European and African segments of the human reference sequence, so are not an artifact of reference sequence bias. Taken together, these results support the view that per-base pair mutation rates may be dynamically and substantially changing across humans.

cool!

wrt to greater number of mutations in african lineages: polygamy (and, therefore, older fathers)? life in the tropics?

(note: comments do not require an email. old san juan. (^_^) )

the hgdp samples again

i’ve written before (here, here and here) about the hgdp samples and the fact that there is very little to no provenance info connected to them. the problem with this, afaics, is that it’s difficult to know whether or not the hgdp samples are truly representative, in all ways, of the populations from which they came.

i was particularly concerned initially about the french (and the japanese) hgdp samples — and then i got over that — but now i’m concerned about them again. here’s why:

the hgdp samples from france are described thusly:

“France – French/various regions (relatives) – This sample from various regions of France is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.”

great!

hang on — which regions?

auvergne? where, in some villages in the eighteenth century, groups of families regularly inbred with one another? lorraine? which, in some areas, had consanguinity rates of up to 50% between 1810 and 1910? burgundy or brittany, both of which had reportedly higher cousin marriage rates in the nineteenth and twentieth centuries than other regions of france? or were the hgdp samples collected in places like central france which, historically, had much lower rates (in the range of 1-3.5%) of close marriages?

the thing is: we don’t know.

what we do know is that the hgdp sampling seems kinda biased towards unique little groups like basques and orcadians, sardinians and the adygei. which is understandable ’cause these are all interesting, unusual groups and there’s legitimate concern that their unique genomes might sorta disappear in our modern, outbreeding world, and it would be a shame to miss out on the chance to at least keep a record of all that human biodiversity.

but then i have to wonder how representative of the majority of french people are the french hgdp samples? do they truly represent “the french,” or did the samples come from some of those crazy little villages way up in the mountains? i dunno. and neither does anybody else (afaik).

and the reason i wonder is: if teh scientists are gonna do really awesome genetic studies to check for the relatedness between the members of different human populations — like runs of homozygosity (roh) studies or identity by descent (ibd) studies — i think they need to know if the samples they’re looking at are representative or not. do the results for “the french” in studies like this or this or this truly represent the average french, or do they represent some special sub-groups of mountain dwelling french?

in the most recent roh study i posted about, the “french” don’t appear to be much more in- or out-bred than orcadians or the basques, something which strikes me as odd. perhaps — perhaps — that’s because the french hgdp samples are not truly representative of the broader french population. perhaps. i don’t know. nor do the researchers.

rinse and repeat above discussion for the other samples, too.

previously: hgdp samples and relatedness and more on the hgdp samples and why i care about the hgdp samples and meanwhile, in france… and runs of homozygosity and inbreeding (and outbreeding) and ibd and historic mating patterns in europe and runs of homozygosity again

(note: comments do not require an email. not out on a limb, am i?)

runs of homozygosity again

**update below**

here’s an exciting new paper!: Genomic Patterns of Homozygosity in Worldwide Human Populations. i don’t have access to the paper itself, but there are lots o’ neat figures and tables in the supplemental data [opens pdf] that relate to runs of homozygosity (roh). roh are identical stretches of dna within an individual’s genome (i.e. identical on each of the dna strands, paternally and maternally inherited). (roh shouldn’t be confused with blocks of identity by descent [ibd], which i did once! ibd blocks are identical stretches of dna as compared between different individuals, iiuc.)

recall that possessing lots of long roh indicates that one’s parents are/were quite similiar genetically speaking. that can be as a result of a couple of different genetic scenarios like (as greying wanderer has brought up a lot recently) simply being from a small sized population (i.e. having a small effective population size) and/or from regular inbreeding (consanguineous/endogamous mating). so, a population having a lot of long roh is either small and/or inbreeds a lot. populations having LOTS of short roh have probably been through some sort of bottleneck (see previous post).

in the paper i looked at in that previous post, the researchers had looked at the different roh lengths for large, regional populations like “europeans” or “east asians.” amongst other things, they had found that some of my regular inbreeders — the fbd marriage folks — had some of the highest numbers of medium and long roh, a state of genetic affairs which likely reflects their long-term close mating patterns. interestingly, the researchers had found that east asians had roh lengths similar to those of europeans across the board, something which surprised me since, at least according to what i’ve been reading, east asians (i.e. the chinese) have been inbreeding for a much longer time than europeans. one drawback of that previous study, though, was that, apart from the french, most of the european populations they looked at were peripheral groups who have had a tendency to inbreed more than my “core” europeans (see mating patterns in europe series below ↓ in left-hand column).

the new paper suffers from some of the same problems since the data come from the same sources (hgdp-ceph and hapmap phase 3 populations), so northern europeans — apart from the french — aren’t included in this paper either. (what can you do? it’s early days yet. i look forward to when there’s lots more genetic data available out there for teh scientists to work with! (^_^) )

what the researchers in this paper have done, though, is to look at both the different mean lengths of roh in each of the different populations sampled AND they looked at total numbers of roh within individuals for each population. this has, i think, drawn out some interesting differences between the populations.

first, here are two graphics from the supplmental data (linked to above). click on each for LARGER views (they should open in new tabs/windows — you might have to click on them again there to super-size them).

i’ve highlighted a handful of populations i want to focus on ’cause i know a little something about their historic mating patterns: the bedouin (as a proxy for the arabs — note that the bedouin have probably inbred more than more settled arabs); italians (not sure if they’re northern or southern italians or a mix of both — however, there are tuscans in the samples with which these “italians” can be compared); pathan or pastuns (more fbd marriage folks, like the bedouins/arabs); and han chinese (there are some northern han chinese with whom this groups can be compared). ok. here are the charts:

as you can see, the researchers have split up the roh into three classes (note that the short and medium classes here are a lot shorter than those in the paper looked at previously):

– A: 0.25-0.40 Mb (short)
– B: 0.6-1.2 Mb (medium)
– C: 0-35 Mb (long)

the interesting thing in the first chart above (Fig. S3 – Mean ROH Length for Each of the Three Size Classes in Each Population), is that the han chinese have lower means of roh length in all of the size classes compared to the other populations i’ve highlighted. in the previous study, the researchers found that east asians had similar means to europeans for all roh lengths. i found this surprising since, from what i’ve read, the han chinese have been inbreeding for a longer period of time than europeans. what might be confounding the results though, once again, is the fact that nw europeans (the outbreeders extraordinaire) are not really included in either of these studies apart from a handful of french samples.

in this latest study, both the bedouin and the pashtun, for instance, have higher means — and wider spreads — of long (class C) roh than the italians, which is what i would’ve expected since those two groups (the bedouins and the pashtuns) are, being fbd marriage folks, serious inbreeders. perhaps the reason the han chinese long roh mean is comparatively low is partly due to the fact that they historically practiced mother’s brother’s daughter (mbd) marriage which doesn’t push towards such close inbreeding as fbd marriage. still, i would’ve expected to see greater means of roh for the chinese than the italians — or, at least, around the same. not so much lower. (unless the italians practiced fbd marriage, too — or fzd marriage — but i don’t think so.)

if you look at the second chart (Fig. S4 – Total Number of ROH in Individual Genomes), however, you’ll see that, overall, the han chinese have more short, medium and long roh totally in individual genomes than any of the other three populations i’ve highlighted. both the bedouins and the pashtuns have greater numbers/wider total spread of long roh than the italians, but the han chinese have a much greater total number of long roh than any of the other three groups — three or four times as many.

but they’re, on average, shorter long roh don’t forget. (confusing, eh?!)

perhaps this is what you get when you have — as the chinese have had — a pretty good-sized effective population size for such a long time. there have been a LOT of han chinese for — wow — millennia.

so, it looks like this (in this order of inbrededness — i think):

– bedouins: highest mean, and very wide spread, of long roh; high total numbers, and widest spread, of long roh.
– pashtun: low mean, but widest spread, of long roh; low total number, but very wide spread, of long roh.
– han chinese: very low mean, and very narrow spread, of long roh; highest total numbers, and wide spread, of long roh.
– italians: low mean, and rather wide spread, of long roh; very low total number, and very small spread, of long roh.

other interesting points are that:

– the tuscans/tsi (toscani) appear to have lower short, medium and long mean roh than the generic “italian” category. however, the tuscans have lower total numbers of long roh than the “italians” while the toscani (tsi), on the other hand, appear to have a greater total number of long roh than the “italians.” while the tuscan samples and the toscani/tsi samples are from different studies (hgdp vs. hapmap), they are all supposed to be from tuscany, so it’s surprising that they’re so different. perhaps the individuals in the toscani/tsi sample were more closely related somehow?

– the northern han samples have lower short, medium and long mean roh than the generic “han” category. this would fit my general impression that historically inbreeding has been greater in southern china than in the north. however, the total number of long roh are greater in the northern han sample than in the “han” sample. not sure what that means.

don’t forget that there can be all sorts of reasons for differences in roh: inbreeding vs. outbreeding, yes, but also effective population size, population movement (migration in or out), bottlenecks, etc. i just happen to be interested in trying to pick out the effects of inbreeding/outbreeding — if possible.
_____

**update – here are a couple of excerpts from the article (thnx, b.b.!) [pgs. 277, 279-281]:

“Size Classification of ROH

“Separately in each population, we modeled the distribution of ROH lengths as a mixture of three Gaussian distributions that we interpreted as representing three ROH classes: (A) short ROH measuring tens of kb that probably reflect homozygosity for ancient haplotypes that contribute to local LD [linkage disequilibrium] patterns, (B) intermediate ROH measuring hundreds of kb to several Mb that probably result from background relatedness owing to limited population size, and (c) long ROH measuring multiple Mb that probably result from recent parental relatedness….

“In each population, the size distribution of ROH appears to contain multiple components (Figure 2A). Using a three-component Gaussian mixture model, we classified ROH in each population into three size classes (Figure 2B): short (class A), intermediate (class B), and long (class C). Size boundaries between different classes vary across populations (Table S1); however, considering all populations, all A-B boundaries are strictly smaller than all B-C boundaries (Figure 2C). The mean sizes of class A and B ROH are similar among populations from the same geographic region (Figure S3), with the exception that Africa and East Asia have greater variability. The class C mean is generally largest in the Middle East, Central/South Asia, and the Americas and smallest in East Asia (Figure S3), with the exception that the Tujia population has the largest values. In the admixed Mexican population (MXL), mean ROH sizes are similar to those in European populations. In the admixted African American population (ASW), however, mean ROH sizes are among the smallest in our data set, notably smaller than in most Africans and Europeans.

“Geographic Pattern of ROH

Several patterns emerge from a comparison of the per-individual total lengths of ROH across populations (Figure 3). First, the total lengths of class A (Figure 3A) and class B (Figure 3B) ROH generally increase with distance from Africa, rising in a stepwise fashion in successive continental groups. This trend is similar to the observed reduction in haplotype diversity with increasing distance from Africa. Second, total lengths of class C ROH (Figure 3C) do not show the stepwise increase. Instead, they are higher and more variable in most populations from the Middle East, Central/South Asia, Oceania, and the Americas than in most populations from Africa, Europe, and East Asia. This pattern suggests that a larger fraction of individuals from the Middle East, Central/South Asia, Oceanis, and the Americas tend to have higher levels of parental relatedness, in accordance with demographic estimates of high levels of consanguineous marriage particularly in populations from the Middle East and central/South Asia, and it is similar to that observed for inbreeding-coefficient and identity-by-descent estimates. Third, in the admixed ASW and MXL individuals, total lengths of ROH in each size class are similar to those observed in populations from Africa and Europe, respectively (Figure 3).

“The total numbers of ROH per individual (Figure S4) show similar patterns to those observed for total lengths (Figure 3). However, in East Asian populations, total numbers of class B and class C ROH per individual are notably more variable across populations than are ROH total lengths.”

previously: runs of homozygosity and inbreeding (and outbreeding) and ibd and historic mating patterns in europe

(note: comments do not require an email. ribbit!)

eugenics in the news

from the u.k.’s telegraph (links added by me):

“Euroscience Open Forum 2012: DNA gene testing ‘will screen out lovers'”
13 Jul 2012

“Couples will soon be able to choose their life partner solely based on the compatibility of their genes instead of through love, a scientific conference has heard.

“Due to the falling cost of DNA testing Britain is on the cusp of a new era of eugenics, according to a leading British scientist.

“Prof Armand Leroi, of Imperial College London, said that within five to ten years it will be common for young people to pay to access their entire genetic code.

“He told the Euroscience Open Forum 2012, in Dublin, that a desire to have a healthy baby will lead more to request access to the view the genes of any prospective partner.

“Armed with this information, the couple could then use IVF to screen babies with incurable diseases.

“While it was unlikely people will have the ‘luxury’ of using the technology to design babies, by their intellect or eye colour, they would instead focus on stopping genetic diseases.

“Addressing a session titled ‘I human: are new scientific discoveries challenging our identity as a species’, he said the cost of genetic sequencing was falling so quickly that ‘it is going to become very, very accessible, very, very soon’….

“He said eugenics were already available, with tens of thousands of unborn babies with Down’s syndrome and other illnesses being aborted every year.

“He told the conference on Thursday: ‘These processes are very well established in most European countries.

“‘Many of the ethical problems that people raise when they speak of neoeugenics are nought once you offer gene selection or mate selection as a eugenic tool.'”
_____

meanwhile, in tonga:

“Tonga’s Crown Prince Tupouto’a Ulukalala marries cousin”
12 July 2012

“The heir to the throne of Tonga in the South Pacific has married his second cousin in the capital Nuku’alofa.

“Crown Prince Tupouto’a Ulukalala and his bride, Sinaitakala Fakafanua, both in their 20s, waved to cheering crowds as they left church after the wedding…..

“Marriage between cousins is seen as a way of keeping the royal bloodline strong in Tonga….”

felicitations to the happy couple! (^_^) (seriously!)

previously: ivy league selective breeding

(note: comments do not require an email. tonga – the friendly islands?)

why i care about the hgdp samples

if the hgdp samples are going to be used to look at the degrees of kinship within populations — which would be awesome! and which prof. harpending started to do recently — then care has to be taken to identify which sets of samples include lots of relatives.

if you’re gonna analyze a bunch of individuals’ genomes to ascertain the degree of kinship between them in order to determine the degree of kinship within their broader population, then you want to make sure you’ve got a random, representative sample from the population and not a bunch of relatives since, of course, a bunch of relatives will naturally have a high degree of kinship.

and if you found a high degree of kinship in a set of samples that included a bunch of relatives and didn’t know you were looking at a set of relatives, you might conclude that there must be a high degree of kinship across the broader population, too, but this might not be the case at all.

for example, take the hgdp samples from the pashtun and the kalash in pakistan. twenty-five genomes are available from each group, but according to rosenberg (see previous post), none of the individuals in the pashtun group were relatives whereas in the kalash group there likely are: one parent-offspring pair, one half-sibling pair (or an equivalent), and four pairs of cousins.

let’s say, then, that it was found that these two sets of samples — the pashtun and the kalash — had exactly the same degree of internal kinship between their members, genetically speaking. that would mean that the members of the broader pashtun population had the same kinship to one another as the several sets of relatives in the kalash population, since the pashtun genomes were random samples whereas the kalash ones were not. it might look like the two groups had the same, population-wide degree of kinship, but in reality that would not be what we had found.

of the 52 population samples in the hgdp (the south african bantus are counted as one group … even though they come from different ethnic groups … hmmmm …), rosenberg found that exactly half (26) include relatives.

the other problem i have with the hgdp samples involves the ones that were collected from immigrant groups here in the u.s.:

– han chinese: “This is a sample of Han Chinese living in the San Francisco, California.”
– japanese: “Collected by L.L. Cavalli-Sforza from Japanese-born individuals living in the San Francisco Bay area, and by K.K. Kidd and J. R. Kidd from Japanese-born individuals living in Connecticut.”
– cambodians: “Collected by K. Dumars from individuals born in Cambodia who are now living in Santa Ana, California.”

are these immigrants really representative of their native populations? are they first- or second- or fourth-generation americans? some immigrant groups start to outbreed in a new land, but others do just the opposite. what’s the case with these groups? how old were the individuals sampled (since in many populations inbreeding rates have gone down in the last 50 years or so)? do they all come from the same region in their native country (like guangdong province), or from all over? to give you an idea of some of the possible problems involved with these sets of samples, have a look at what i said about the japanese samples in the previous post.

and i still have a bug about which regions of france the french samples came from (see previous post). (~_^) it probably doesn’t matter that much, but it would’ve been nice to know how widespread the sampling was, i.e. how random and representative of the entire population of france are these samples? historically, different regions of france have had different inbreeding rates as can be seen in this map of the inbreeding coefficients for france, 1926-1945 [pg. 620] (my guess is this pattern goes back a long way, too — probably to the early medieval period and the introduction of manorialism in continental europe):

so, which regions the samples were drawn from in france might actually make a difference — especially depending upon how deeply one drills down into the question of relatedness. i wonder the same thing about many of the other samples, too, for instance the ones from russia, but we know that all those samples came from the vologda administrative region so we are at least aware that they may not be representive of all ethnic russians.

despite all these potential difficulties, i look forward to more genetic research into kinship and relatedness within populations — from prof. harpending or whomever! very cool stuff! (^_^)

previously: hgdp samples and relatedness and more on the hgdp samples

(note: comments do not require an email. just skip the email!)

more on the hgdp samples

first, see my previous post on this if you want to follow along.

in that post, i expressed some concerns over the french human genome diversity project (hgdp) samples since the ceph folks describe them as: French (various regions) relatives. i wondered both of the following: 1) how many and which “various regions,” since different regions of france have historically had different rates of inbreeding — haven’t managed to find out which “various regions” — and 2) how many and what sorts of relatives? i did find out that.

via some genetic wizardry, a noah rosenberg tried to work out if any of the individuals in any the hgdp samples were, in fact, relatives [see here]. to cut a long story short, rosenberg found it likely that two individuals in the french sample were siblings [see pg. 7 here – opens pdf], thus the “relatives” indicator on the ceph website. so, the entire french sample is NOT full of family members like i wondered in my last post — only two of the individuals sampled are likely to have been relatives.

i still think it would be useful to know from which regions the samples were drawn, but i guess i just have to live with not knowing for the meantime. (~_^) but now i feel more secure about professor harpending’s conclusion — that regarding the french: “from the viewpoint of kinship, one person is not very different from another person.”

however, now i feel unsure about the japanese samples! the hgdp samples for the japanese are described on ALFRED as:

“Collected by L.L. Cavalli-Sforza from Japanese-born individuals living in the San Francisco Bay area, and by K.K. Kidd and J. R. Kidd from Japanese-born individuals living in Connecticut.”

ack! well, how representative of japanese people in japan are these people? where did they come from? urban areas? rural areas? different areas? mostly the same areas? how old were they?

i ask all these questions because, historically, urban japanese have had lower inbreeding rates than rural japanese … and the inbreeding rates overall for japan dropped pretty sharply after wwii [see pgs. 4-5 here – opens pdf]. so if the samples include mostly young, urban japanese who recently moved to the u.s., well i wouldn’t be surprised if they look quite outbred. but if the samples include mostly older, rural japanese, i would be surprised if they looked outbred.

now i don’t have any confidence in the japanese hgdp samples — not for looking at kinship within the japanese population anyway. btw, rosenberg didn’t find any likely relatives in the japanese samples.
_____

i went through the ceph table of the hgdp samples and ALFRED and compiled a list of all the hgdp samples and if they 1) likely include any family members (“relatives” – based on rosenberg), and 2) where the samples were collected and from whom, if known. many of the samples don’t have any useful information on their provenance. for example, many of the ALFRED entries say that the samples were drawn from unrelated individuals, but rosenberg found that they, in fact, likely included relatives.

why do i care about any of this? i’ll explain that in another post. right now … coffee! (^_^)

**update: see why i care about the hgdp samples**
_____

the list:

– Central African Republic – Biaka Pygmy (relatives)
This sample is comprised of Biaka, living in the village of Bagandu, in the southwest corner of the Central African Republic (3.42N; 18E altitude approximately 500m). This group is probably an admixture of 3/4 “non-pygmy” African ancestry and 1/4 Mbuti ancestry. The transformed cell lines were established by Judith R. Kidd. The sources of this sample are L. Cavalli-Sforza (Stanford University) and K.K. Kidd, J.R. Kidd (Yale University).

– Democratic Rep of Congo – Mbuti Pygmy (relatives)
The sample is composed of Nilosaharan and Niger Kordofanian speaking Mbuti pygmies from the northeastern part of the Ituri Forest (northeastern Democratic Republic of the Congo). It was collected by L.L. Cavalli-Sforza in 1986.

– Senegal – Mandenka (relatives)
This sample from the Central African Republic is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– Nigeria – Yoruba (relatives)
Most of the Yoruba individuals in this sample are urban health care workers from Benin City, Nigeria, collected by Prof. Friday E. Okonofua and collaborators; cell lines established by Dr. J.R. Kidd.

– Namibia – San (relatives)
This sample from Namibia is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– Kenya – Bantu NE (relatives)
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– S. Africa – Bantu SE Pedi
– S. Africa – Bantu SE Sotho
– S. Africa – Bantu SE Tswana
– S. Africa – Bantu SE Zulu
– S. Africa – Bantu SW Herero
– S. Africa – Bantu SW Ovambo

These samples are part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). They include the following individuals: #993, 994, 1028, 1030, 1031, 1033, 1034, and 1035. These samples consist of unrelated Bantu speakers from southern Africa and were collected with proper informed consent.

– Algeria – Mozabite (relatives)
This sample from Algeria is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– Israel (Negev) – Bedouin (relatives)
This sample from the Negev region of Israel is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– Israel (Carmel) – Druze (relatives)
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). The Druze, a Moslem community from Northern Israel. Collected by B. Bonne-Tamir (Tel Aviv University) as part of the repository of samples of Israeli populations. This sample contains both related and unrelated individuals.

– Israel (Central) – Palestinian (relatives)
This sample from the central region of Israel is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– Pakistan – Brahui
– Pakistan – Balochi (relatives)
– Pakistan – Hazara (relatives)
– Pakistan – Sindhi (relatives)
– Pakistan – Pathan
– Pakistan – Kalash (relatives)
– Pakistan – Burusho

These samples from Pakistan are part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). These samples consist of unrelated individuals and were collected with proper informed consent.

– Pakistan – Makrani
*no info found.*

– China – Han
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This is a sample of Han Chinese living in the San Francisco, California. Collected by L. Cavalli-Sforza (Stanford University), K.K. Kidd, and J.R. Kidd.

– China – Tujia
– China – Yizu/Yi
– China – Miaozu/Miao
– China – Oroqen (relatives)
– China – Daur
– China – Mongola
– China – Hezhen
– China – Xibo
– China – Uygur
– China – Dai
– China – She
– China – Lahu (relatives)
– China – Naxi (relatives)
– China – Tu

These samples from China are part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). These samples consist of unrelated individuals and were collected with proper informed consent.

– Siberia – Yakut
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). Yakut-speaking individuals in the Yakut Autonomous Republic. Individuals sampled were living or were born along the river Lena in the area of Yakutsk and northward, roughly 129-130E, 62-64N. This sample was collected by E.L. Grigorenko.

– Japan – Japanese
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). Collected by L.L. Cavalli-Sforza from Japanese-born individuals living in the San Francisco Bay area, and by K.K. Kidd and J. R. Kidd from Japanese-born individuals living in Connecticut.

– Cambodia – Cambodian (relatives)
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). Collected by K. Dumars from individuals born in Cambodia who are now living in Santa Ana, California.

– France – French/various regions (relatives)
This sample form various regions of France is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– France – Basque
This sample from France is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– Italy – Sardinian
This sample from Italy is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– Italy – from Bergamo
– Italy – Tuscany

*no info found.*

– Orkney Islands – Orcadian (relatives)
This sample from the Orkney Islands is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of unrelated individuals and was collected with proper informed consent.

– Russia Caucasus – Adygei
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). Adygei-speaking people near Krasnodar in the Russian republic of Adygei, which is in the southeastern section of the country (north of the Caucuses mountains). They are culturally and linguistically distinct from neighboring Russians. This sample was collected by E. Grigorenko (Yale University) V. Galkina, and M. Kadoshnikova (Bristol company, Russia).

– Russia – Russian
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). Sample collected by E. Grigorenko from rural communities of ethnic Russians living in the Vologda Administrative Region, about 400 km north of Moscow, roughly 59-61N, 39-41E.

– Mexico – Pima (relatives)
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). Collected from Pima living near the eastern border of the state of Sonora, Mexico. Collected by L.O. Shulz.

– Mexico – Maya (relatives)
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). This sample consists of Mayans who are Yucatec speakers from in the Xmaben village located in the Mexican state of Campeche in the central Yucatan peninsula. Blood and serum markers indicate European admixture to be about 10 % (K. Weiss, personal communication). Some evidence suggests that the area from which this sample was drawn served as a refuge for Maya people from across southern Mexico who fled to this more remote region during a series of revolts against the Spanish in the 19th and early 20th centuries. There are 53 transformed cell lines (106 chromosomes) established by Judith R. Kidd. The sources of this sample are K.K. Kidd and J.R. Kidd (Yale University).

– Colombia – Piapoco and Curripaco (relatives)
*no info found.*

– Brazil – Karitiana (relatives)
This sample is part of the Human Genome Diversity Cell Line Panel collected by the Human Genome Diversity Project (HGDP) and the Foundation Jean Dausset (CEPH). The sample was collected in the Karitiana village (Rondonia Province, Brazil) by F. Black. HLA haplotypes indicate that the Karitiana have no non-Amerindian admixture and are genetically distinct from other sampled populations in relative geographical proximity, such as the Surui.

– Brazil – Burui (relatives)
*no info found.*

previously: hgdp samples and relatedness

(note: comments do not require an email. remember: you’re better off just skipping it!)