I’ve recently started a journey of using DNA to further my family tree research. I have an extensive paper trail family tree from many years of research, but I was really interested in what science could offer in terms of using a DNA test to find cousins and confirm who I thought were my family. DNA testing is relatively new for genealogical purposes but it has been of huge interest to adoptees (of which I’m not – I know my biological family) for obvious reasons. The goal of DNA testing for genealogical purposes is to find the most recent common ancestor shared by you and the matches returned by the DNA testing company.
At first it seemed sensible enough, if DNA could prove who committed a crime, and prove paternity then surely it could be used to find family and prove family connections. Sure enough there is a particular type of testing that can be used for genealogical purposes. There are many types of DNA tests (not just genealogical ones) – but the one used for family history purposes is called an Autosomal DNA test. Basically this is not a full sequencing of our genome (that would cost an awful lot of money!) but it is a test that, put simply, just looks at what makes us different from each other. These differences we all have (red hair, big nose, long legs…) are inherited from our biological families. Within our DNA there are areas that have mutated over the years to create these differences in the human race. As DNA gets more and more randomised over each generation (and in fact more mutations start to occur), we can only really compare our differences from the past 5 or 6 generations – after that it is too mixed up and randomised to be useful for genealogical purposes (remember I’m only talking about genealogical DNA tests).
I needed to learn about how DNA is inherited so that I could understand what the tests were telling me. We inherit 50% of our father’s DNA and 50% of our mother’s DNA – after that DNA is quite random and mixed up. So the 50% of our mother’s DNA could be made up of a large portion of her mother and a small portion of her father. Although she inherited 50% of each parent it has been recombined before being passed on. So it is incorrect to assume that your DNA is exactly 25% of each grandparent, but you do know for sure that you have exactly 50% of each of your parents. If you google “DNA inheritance” you’ll find lots of charts that explain this is much more detail.
I took the DNA test initially at Ancestry – it was all quite simple, spit into a tube and mix in some stabilising solution (supplied) and put it back in the special postage box and drop it off at the post office. Samples are sent off to labs in the USA for testing in a pre-paid box that comes with the sample tube. It takes a few weeks for the sample to be processed, but after much anticipation the results were in. Initially when getting results, like most people I dived straight into what my ethnicity was – I was over 90% British, 5% Irish and 2% Finnish (so far the Finnish part is still a mystery – but I will touch on Ethnicity results later). Once I got over the excitement of ethnicity results I suddenly realised I was faced with a very long list of names of people who appeared to be related to me. These matches were listed in the order of closest matches (in Ancestry terms these were given a “Possible range” such as “Close family – 1st cousins” through to more distant cousins (“5th-8th cousins”). But that was it, what did it all mean? and how was I supposed to figure this out – none of these people (there were hundreds of them) were known to me. Now I have been doing family research for many years, so I was expecting (I think) to just see names that I already had in my tree, and simply match up that these cousins had the same paper trail as me, and we could confirm my years of research. It was NOT that simple, and I suddenly realised that I needed to understand more about how we matched and what made some people a closer match. How on earth would I sort all this out? Turns out I wasn’t the only person dazed and confused with my DNA test results! I joined a few facebook groups, started watching some videos and reading about genetics and DNA testing, and slowly the clouds cleared but it certainly is much more complex than I initially thought. I’m well on the way to confirming a lot of different matches and below is more about my journey. This is not written in a technical or scientific manner – I’ve attempted to write this as if we were just having a chat about it. Apologies if I’ve skimmed over some technical areas, or made a broad sweeping statement on something that is much more complex, it is my attempt to simplify the more complex information, with the hope that people may go on to learn more about the complexities of the subject (it really is fascinating).
Having learnt a bit about DNA, the next thing I went on to learn about was something about chromosomes – I had no idea how these worked, but they are basically what hold all this DNA. We have 46 chromosomes, 23 from each biological parent. Of the 23 chromosomes from each parent, 22 (44 in total) of them are called the “autosomes” (this is where the name autosomal test comes from). Chromosomes have areas where each human is different – these differences are called SNPs. (pronounced snips). If we are closely related to someone our differences are the same as their differences (ie red hair, big nose, long legs). The SNPs are the areas of DNA that are used for genealogical purposes and the various SNPs (differences) that are tested can be measured in a certain way that makes them comparable for close relatives. The measure used by the testing companies is a centiMorgan (cM) . This is not a simple measurement like inches, or a count of things. It’s a mathematical formula that works out what they call “genetic distance”. I found out that initially all I needed to know was that the more shared cM (the higher the number), the closer the match I was to someone (and the more likely we had a shared ancestor). There are in fact two cM numbers that are relevant in our DNA testing, the “total shared cM” and the “largest cM” sometimes called “longest block” (this is largest segment of DNA, or to put it simply where the shared DNA is all together and not split up). Full siblings have a lot of DNA in common (because they both got 50% from the same parents), they have a shared cM of around 2,550 but they are not identical because of the DNA inheritance and how the DNA passed on is randomised. Second cousins match at around 212.5cM. Sites like ISOGG wiki have charts that help us work out what the likely relationship of a match is, based on the total shared cM. These charts become invaluable to search for a common ancestor of you and your match as you need to know if you are looking for a common ancestor 1 generation back or 4 generations back. The higher the number the closer the match. http://www.isogg.org/wiki/Autosomal_DNA_statistics
For my test at ancestry, I realised that ancestry do not show any cM numbers [edit Jan/2016: from late 2015 Ancestry do now provide a total cM number for each match]. However they do produce a chart that shows what the cM numbers must be for them to class our matches into various categories of confidence and likely relationship (It is available on the Ancestry site). One of the other big 3 testing sites is ftDNA (familytreeDNA) – they do show all the cM numbers for your matches – on the initial page of matches they show the total shared cM and then by clicking below the match name you’ll see more detail including the more important longest block cM. 23andMe the other of the big 3, requires the match to allow you to see your matching details, they are not displayed initially. I could have just sat back and looked at the ancestry results, compared the detail they did let me see (and they do have a lot of information, it’s just they don’t share “the numbers”) and contact my matches to see if we could work out together who our common ancestor might be. Ancestry do offer some automated tools, they will give you a “hint” if your DNA match has the same paper trail as you, and they will also show you on a map where your matches paper trail locations are. They also show you a list of the surnames in the tree of your match. This assumes that the person who matches you, also actually has a public tree or has even done their family tree. It started to emerge that many of the testing sites do have problems in that people were not necessarily sharing their trees or replying to contact. Each of the testing sites vary in terms of what you can see or do with your DNA results. So although I had now learnt all about understanding autosomal DNA results, I was coming up against brickwalls in terms of knowing and tracking down who these matches were, let alone who our shared ancestor was.
My knight in shining armour was a site called GEDmatch (gedmatch.com). GEDmatch is run by volunteers and is free, it provides tools to help examine DNA matches closer, by providing detailed cM numbers and chromsome browsers to drill down to the detail of how you match someone. Up until this point the only site I had seen a chromosome browser on, was ftDNA. Ancestry do not have a chromosome browser and 23andMe do, but it requires your match to allow you to see their DNA results first. To be able to use GEDmatch I needed to send them (upload) my DNA results – this was actually very simple. Each of the big 3 testing sites allow you to download your DNA file, and GEDmatch is setup to take this download file and process it into their database – it is just a simple matter of registering on the site and going through their upload process (follow the links on their home page – you need to register first). Although by this stage I had test results at each of the big 3 sites, it is only necessary to upload one of your files (for the purposes of GEDmatch they are all identical). Speaking of testing on all sites, ftDNA site (www.familytreedna.com) offers an “autosomal transfer” – this means that if you have a test on ancestry you can upload it to ftDNA and they will process it into their database as if you had the test there. There is a cost for this but it is very small in comparison to actually doing their test. This means you can be in both the Ancestry and ftDNA database for the price of 1 test and a small additional fee. A bargain! I tested with 23andMe directly as I was becoming more and more interested in DNA and wanted to see how their tests and site worked. They have some have fun additions to their results including your Neanderthal percentage and other various things. 23andMe also offer heath testing (not available in all countries). I found this very interesting but will not be covering that in this blog post. All I would say is if you have any questions or concerns regarding health reports from your DNA test, then you need to discuss them with health professionals and not genealogists. If you want health results and have not tested with 23andMe, there are other sites available – you can google for them, but Promethease is a suitable alternative and you can upload your DNA file for a comprehensive health result for a very small price.
So back to GEDmatch – after a few days (although now it can sometimes only take a few hours) my data was processed and able to be analysed in a “‘One-to-Many’ matches” report. You can find this and other reports on the GEDmatch home page once you have registered and logged in. GEDmatch is more of a technical site than perhaps the nice user friendly interface of the testing sites themselves. There are some very good blogs about using GEDMatch but I will cover some of the tools I’ve used in the most simple way I can. There are 2 reports I’m going to cover – these are the ones that you will use the most when you first get started. The first report to run is the One-To-Many matches report. When you click on it the page opens and looks a bit complex to start with – but there is only one field you need to fill in and that is your KIT number. A KIT number is assigned to you by GEDmatch when go through the process of uploading your DNA file. You can upload files from any other relatives that you have tested and uploaded as well – they will all be available on the front page when you log in – each with their own KIT number. I now have both my maternal uncle and my son on GEDMatch [edit: Jan/2016 – I know have mother, sister, and several uncles on GEDMatch – I am truly addicted]. I’ll explain why you might want to test more family members – when you see your matches, you are unable to tell where they might match you, is a cousin on your fathers side or your mothers side? There is no way that the test can work this out for you. If you tested your mother for example, you would see that many of your cousin matches are also her cousin matches. (when I say cousin it may be first, second, third, 2nd 1 times removed etc…). When you looked at your mothers matches you know that that match comes from her side of the family. You can safely assume that the majority of matches that don’t match her must be from your fathers side. (I say the majority, because there could be several matches where the cM numbers are low and these may be coincidental that you match them, rather than because you have inherited that particular DNA from a common ancestor).
GEDmatch reports can be quite confusing to start with as they look like spreadsheets with lots of numbers. For an excel fan like me that was quite exciting, but I imagine that anyone who may not be so technical or mathematical, it might look hideously confusing. So lets breakdown what we actually see on these reports. By running a GEDMatch many to one report, you will get a list of 1500 matches – these are people that match your DNA and have also uploaded to GEDMatch. You can tell where these people did the actual test as the first column lists their kit number – a kit starting with A means it was an ancestry test, F for ftDNA and M for 23andMe tests. This can be useful to know if you are researching someone and want to try and look for them on the site they originally tested on. Especially useful if you have tested on the same site as them. Going across the page of the GEDmatch report you want to look across for those 2 important numbers – total cM and largest cM. GEDmatch will sort the report with the largest shared cM at the top. I like to re-sort the page by the largest cM, and it can be done by just clicking the little arrows in the top of that cell where the column name is. The next step I took when I first ran my reports, was to download this list of people to an excel spreadsheet. You can do this by just highlighting them all on your computer and copying and pasting them to a spreadsheet. The reason I did this was because they were now on my own computer, but also it meant I could add some columns to write notes next to each match, or I could highlight the names I knew, or the names I was more interested in researching. Because I couldn’t possible research 1500 names, I extracted the top few matches and put them into a new sheet – I called these my TOP MATCHES. I’ve since added to my top matches by adding matches from the test sites (although several of them were already in GEDmatch, so I only added those that weren’t – and again kept it only to the closest relationships). My Top Matches list is where I spend all my time now – I’m only working on matches that have the closest match, I’m not bothering (at the moment) with all the more distant cousins. It’s worth noting that 2nd cousins mean shared Great Grandparents, 3rd cousins means shared 2xG Grandparents and so on. Everyone has 4 grandparents, 8 G Grandparents, 16 2xG grandparents, 32 3xG grandparents and 64 4xG Grandparents – however this would be different if you have cousin marriages in your tree). It’s estimated that everyone has approximately 4,700 5th cousins, that is a lot of cousins that you potentially could get DNA matches with. It’s most likely that your paper trail family tree has many gaps in it and that you have not documented your family at this level right out through all branches. After many years of research I know that I do not have all that information, and some information I do have is quite sketchy due to unavailable records and incorrect information. It’s believed that we all have 2 family trees. Our documented genealogical family tree and now our genetic biological family tree. Whilst these may be the same, there is definitely room for them to be different, with illegitimate births, unknown parentage, adoptions, and the like. DNA testing is bringing several family secrets out into the open. You need to be prepared that you may find out things that were previously hushed up.
- I immediately start a family tree with my match at the bottom of the tree – I use ancestry for this – I set up a new tree and make sure it is private (and also go the extra step to stop it being found in searches).
- Look through the profile of the match – can you find them on sites other than ancestry, can you find clues as to their parents or grandparents? Google their email address this often pops up with some ancestry type sites they have listed their details in. If in Ancestry you can search their member profile for any hint of a surname they might be searching for. In ancestry sometimes it says they have no tree, but if you look their is one they have just not linked to it. Google their name (if you can figure out their actual name) – or just google their profile name. You are looking for any hints (obituary pages, research pages, social media sites, 192 type sites etc) where they mention their full name or better yet parents/grandparents name. Fill in their tree with what you find. Sometimes you only need 1 generation in a tree to start finding more – especially if you are on Ancestry as you start to immediately get hints to build up the tree. Hopefully this leads straight to a common ancestor.
- Look at a “cluster” of people – those all matching on a particular chromosome and are confirmed to all match each other. If you can find a link to one, you should find them all fitting into place. I do this work all on my excel spreadsheet – but there are other tools to help you do this – two of the tools that spring to mind are GenomeMate and DNAGedcom. These sites have applications that help effectively manage your top matches and keep track of where and how they all match you and each other. I use a spreadsheet because I like excel and I use it a lot in my worklife, so I have built my own chromosome browser. You don’t have to do this (and you probably shouldn’t) – you can use sites that have applications already set up to use.
- Rule out where the common ancestor CANNOT be. This will stop you searching your entire tree to find a common ancestor. If you have someone marked down as a paternal match, then anyone matching that person should also be a paternal match (there will be exceptions and keep an eye out where one-to-one matches show something you weren’t expecting). Keep your family charts in front of you when doing this. There is an additional test here that can also help you – and that is the results from the X-Chromosome. The X Chromosome is 1 of the pairs of sex chromosomes that come from your parents – 1 from each. Your mother (regardless of your sex) will ALWAYS pass down her X-Chromosome (which will be a randomised version of her 2 X chromosomes from her parents) and your father will pass down an X Chromosome if you are a female and a Y Chromosome if a male). Although DNA inheritance on the X Chromosome is somewhat different than the other 22 pairs of autosomes, it can be used to try and rule out some areas of the family tree that the match CANNOT be on, given a male will not get an X match with his father. So if you are male and share matching DNA on the X Chromosome, you know that match cannot be on the paternal side and you share a common ancestor on your mothers side. GEDmatch does show X matches and the amount of DNA shared – you can use the X match information for this purpose, but be aware of trying to use for anything else. Due to the inheritance pattern of X DNA the match may have come many many years ago (before paper trail genealogy timeframes).
- Test more of your family – it really is helpful to have another family member to test, if possible your mother or father. Their matches will also go back 5-6 generations so this will give you additional matches another generation back. Testing the oldest living relative as soon as possible, should be on all of our ToDo lists!. There is limited use in testing a sibling, as they will be have similar matches to you (although some additional due to DNA inheritance – ie. they will have different bits of your parents). A relative who is paternal or maternal is definitely going to help you sort matches into paternal/maternal, once you can establish which side your match is from then it immediately removes 50% of your tree you need to search for the most recent common ancestor. However whilst you can sensibly conclude that anyone who does not match your mother MUST come from your fathers side, you can not make the same assumption with an uncle or aunt. They will have different bits of DNA from your grandparents than you do. You only have approximately 25% of their parents DNA and you cannot tell which percentage you have.