your ethnicity…

…they can work that out from your mtdna. with a fr*ckin’ 80-90% accuracy rate! awesome. (don’t ask me what a ‘support vector machine’ is, but it sounds awesome, too!)

it’s “coarse ethnicity” — for instance caucasian, asian, african, hispanic — but still:

Inferring ethnicity from mitochondrial DNA sequence

“Background: The assignment of DNA samples to coarse population groups can be a useful but difficult task. One such example is the inference of coarse ethnic groupings for forensic applications. Ethnicity plays an important role in forensic investigation and can be inferred with the help of genetic markers. Being maternally inherited, of high copy number, and robust persistence in degraded samples, mitochondrial DNA may be useful for inferring coarse ethnicity. In this study, we compare the performance of methods for inferring ethnicity from the sequence of the hypervariable region of the mitochondrial genome.

“Results: We present the results of comprehensive experiments conducted on datasets extracted from the mtDNA population database, showing that ethnicity inference based on support vector machines (SVM) achieves an overall accuracy of 80-90%, consistently outperforming nearest neighbor and discriminant analysis methods previously proposed in the literature. We also evaluate methods of handling missing data and characterize the most informative segments of the hypervariable region of the mitochondrial genome.

“Conclusions: Support vector machines can be used to infer coarse ethnicity from a small region of mitochondrial DNA sequence with surprisingly high accuracy. In the presence of missing data, utilizing only the regions common to the training sequences and a test sequence proves to be the best strategy. Given these results, SVM algorithms are likely to also be useful in other DNA sequence classification applications.”

(but don’t forget — race doesn’t exist. it’s juuuust a social construct….)

(note: comments do not require an email.)


  1. How fun! I wonder if all these ancestry sites work the same way. I have always wanted asses my own ethnicity, but I am concerned the Southern Italian side may have a little berber in there (swarthy, thick lips). Then again Rosie Huntington looks fully European and she has those same lips. Who knows…so much mixing going on these days…..

    Have you ever done yours? What was the outcome? Can they break it down further by ethnicity?


  2. Can they really determine “Hispanic” ethnicity with such high accuracy?

    BTW: SVM is a statistical machine learning algorithm. You train it by feeding it lots of data points – in this case, genetic markers whose ethnicity you already know.
    The algorithm uses what it’s “learned” to determine the most likely ethnicity of any new genetic marker you show it.

    (Statistical-based algorithms are increasingly popular these days for all sorts of applications. In the past, computer scientists tried to create algorithms that were actually “intelligent”, but that approach was a failure. You get better results by teaching the computer to mimic a human being.)


  3. Can they really determine “Hispanic” ethnicity with such high accuracy?

    Why not? The Spanish didn’t send many women to the New World and mitochondrial DNA is passed down through the female line. Even in Argentina, which has a significantly higher level of European ancestry than other Spanish-speaking nations in the Western Hemisphere, has a fair amount of indigenous mtDNA.


  4. I suppose that’s correct for Mexicans, but in any case, they really should have specified that they identify Native American ethnicity, not Hispanic.

    I guess saying “Hispanic” makes it sound more useful in an American context.


  5. @flavia – “I wonder if all these ancestry sites work the same way.”

    pretty much. i think most of the ancestry dna sites will tell you what haplogroup your your mtdna or y-chromosome belongs in (depending on if you’re female or male). those are still pretty broad-ish categories. for instance, i don’t know if there’s much difference between y-chromosome haplogroups in northern vs. southern italy. i’m guessing not, but i really don’t know.

    my mtdna haplogroup is h1 — which tells me that i’m western european in ancestry. well, i already knew that. (~_^) i still think it’s awesome that i know which haplogroup i’m in, tho. (^_^)

    but, greater precision in ancestry dna is coming! i’m certain of that. (^_^)


  6. @ihtg – “SVM is a statistical machine learning algorithm…. You get better results by teaching the computer to mimic a human being.”

    that’s pretty cool. thnx! (^_^)


  7. re: Hispanic. I guess they could detect that. Hispanic is basically (usually) a mix of Amerindian and Spanish. If they find Amerindian blood in an otherwise European or Black person, chances are, they are Hispanic.

    There are “Hispanics” (ugh, I hate that term) like me who are white white white white. Like the driven snow. The white Hispanics i know are all fairly recent immigrants who can trace their European lineage back to their immigrant grandparents.


  8. Ha islamic world aint so bad. If you do poetry, Hafiz is a flipping monster, English by J McCarthy. Hafiz > Rilke, Hölderlin, Dickinson, (IMO) Shakespeare, Whitman, Rimbaud – basically the works. The obvious exception is also semitic, the Hebe bible (Job, Canticles) which is #1 in the world. Oops Hafiz isn’t semitic though hes persian.


  9. I also abhor the whole ‘hispanic’ term. It’s so varied and ultimately non-descriptive, scientifically speaking. Most government forms in the US divide race groups and then add the tagline, “not of Hispanic descent.” This makes it only possible many times to choose “Hispanic” if you have any Spanish-speaking ancestors or a Spanish surname, despite other ancestry. Sorry, I realize the government should not be the source of scientific definitions. If I had to go by mtDNA…my ‘course ethnicity’ is European (the ‘hispanic’ is my dad). My complete analysis showed European, about a 25% estimate of native American heritage (mixed Asian/European), and a small amount of African. This has made an admixture analysis of my genome a bit confusing. The overall analysis showed I am most similar to an average South American-genomically speaking. It will be cool when they can possibly tease out the different groups of ancestors for those of us who are such genomic mutts.


  10. @wara – “It will be cool when they can possibly tease out the different groups of ancestors for those of us who are such genomic mutts.”

    that will be cool! and, if they can do this, then they should be able to do that eventually as well. (^_^)


