Photos, Old Speedway and a lot of Genetic Genealogy
Photos, Old Speedway and a lot of Genetic Genealogy

Put the kettle on, lets sit down and talk about DNA (*for newbies only)

I’ve recently started a journey of using DNA to further my family tree research.  I have an extensive paper trail family tree from many years of research, but I was really interested in what science could offer in terms of using a DNA test to find cousins and confirm who I thought were my family.  DNA testing is relatively new for genealogical purposes but it has been of huge interest to adoptees (of which I’m not – I know my biological family) for obvious reasons.  The goal of DNA testing for genealogical purposes is to find the most recent common ancestor shared by you and the matches returned by the DNA testing company.

 

At first it seemed sensible enough, if DNA could prove who committed a crime, and prove paternity then surely it could be used to find family and prove family connections.  Sure enough there is a particular type of testing that can be used for genealogical purposes.  There are many types of DNA tests (not just genealogical ones) – but the one used for family history purposes is called an Autosomal DNA test.  Basically this is not a full sequencing of our genome (that would cost an awful lot of money!) but it is a test that, put simply, just looks at what makes us different from each other.  These differences we all have (red hair, big nose, long legs…) are inherited from our biological families.  Within our DNA there are areas that have mutated over the years to create these differences in the human race.  As DNA gets more and more randomised over each generation (and in fact more mutations start to occur), we can only really compare our differences from the past 5 or 6 generations – after that it is too mixed up and randomised to be useful for genealogical purposes (remember I’m only talking about genealogical DNA tests).

I needed to learn about how DNA is inherited so that I could understand what the tests were telling me.  We inherit 50% of our father’s DNA and 50% of our mother’s DNA – after that DNA is quite random and mixed up.  So the 50% of our mother’s DNA could be made up of a large portion of her mother and a small portion of her father.  Although she inherited 50% of each parent it has been recombined before being passed on.  So it is incorrect to assume that your DNA is exactly 25% of each grandparent, but you do know for sure that you have exactly 50% of each of your parents. If you google “DNA inheritance” you’ll find lots of charts that explain this is much more detail.

I took the DNA test initially at Ancestry – it was all quite simple, spit into a tube and mix in some stabilising solution (supplied) and put it back in the special postage box and drop it off at the post office.  Samples are sent off to labs in the USA for testing in a pre-paid box that comes with the sample tube.  It takes a few weeks for the sample to be processed, but after much anticipation the results were in.  Initially when getting results, like most people I dived straight into what my ethnicity was – I was over 90% British, 5% Irish and 2% Finnish (so far the Finnish part is still a mystery – but I will touch on Ethnicity results later).  Once I got over the excitement of ethnicity results I suddenly realised I was faced with a very long list of names of people who appeared to be related to me.  These matches were listed in the order of closest matches (in Ancestry terms these were given a “Possible range” such as “Close family – 1st cousins” through to more distant cousins (“5th-8th cousins”).  But that was it, what did it all mean?  and how was I supposed to figure this out – none of these people (there were hundreds of them) were known to me.  Now I have been doing family research for many years, so I was expecting (I think) to just see names that I already had in my tree, and simply match up that these cousins had the same paper trail as me, and we could confirm my years of research.  It was NOT that simple, and I suddenly realised that I needed to understand more about how we matched and what made some people a closer match. How on earth would I sort all this out?  Turns out I wasn’t the only person dazed and confused with my DNA test results!   I joined a few facebook groups, started watching some videos and reading about genetics and DNA testing, and slowly the clouds cleared but it certainly is much more complex than I initially thought.  I’m well on the way to confirming a lot of different matches and below is more about my journey.  This is not written in a technical or scientific manner – I’ve attempted to write this as if we were just having a chat about it.  Apologies if I’ve skimmed over some technical areas, or made a broad sweeping statement on something that is much more complex, it is my attempt to simplify the more complex information, with the hope that people may go on to learn more about the complexities of the subject (it really is fascinating).

Having learnt a bit about DNA, the next thing I went on to learn about was something about chromosomes – I had no idea how these worked, but they are basically what hold all this DNA.  We have 46 chromosomes, 23 from each biological parent.  Of the 23 chromosomes from each parent, 22 (44 in total) of them are called the “autosomes” (this is where the name autosomal test comes from).  Chromosomes have areas where each human is different – these differences are called SNPs.   (pronounced snips).  If we are closely related to someone our differences are the same as their differences (ie red hair, big nose, long legs).  The SNPs are the areas of DNA that are used for genealogical purposes and the various SNPs (differences) that are tested can be measured in a certain way that makes them comparable for close relatives.  The measure used by the testing companies is a centiMorgan (cM) .  This is not a simple measurement like inches, or a count of things.  It’s a mathematical formula that works out what they call “genetic distance”.  I found out that initially all I needed to know was that the more shared cM (the higher the number), the closer the match I was to someone (and the more likely we had a shared ancestor).  There are in fact two cM numbers that are relevant in our DNA testing, the “total shared cM” and the “largest cM” sometimes called “longest block” (this is largest segment of DNA, or to put it simply where the shared DNA is all together and not split up).  Full siblings have a lot of DNA in common (because they both got 50% from the same parents), they have a shared cM of around 2,550  but they are not identical because of the DNA inheritance and how the DNA passed on is randomised.  Second cousins match at around 212.5cM.  Sites like ISOGG wiki have charts that help us work out what the likely relationship of a match is, based on the total shared cM.  These charts become invaluable to search for a common ancestor of you and your match as you need to know if you are looking for a common ancestor 1 generation back or 4 generations back.  The higher the number the closer the match.  http://www.isogg.org/wiki/Autosomal_DNA_statistics

For my test at ancestry, I realised that ancestry do not show any cM numbers [edit Jan/2016:  from late 2015 Ancestry do now provide a total cM number for each match].  However they do produce a chart that shows what the cM numbers must be for them to class our matches into various categories of confidence and likely relationship (It is available on the Ancestry site).  One of the other big 3 testing sites is ftDNA (familytreeDNA) – they do show all the cM numbers for your matches – on the initial page of matches they show the total shared cM and then by clicking below the match name you’ll see more detail including the more important longest block cM.    23andMe the other of the big 3, requires the match to allow you to see your matching details, they are not displayed initially.  I could have just sat back and looked at the ancestry results, compared the detail they did let me see (and they do have a lot of information, it’s just they don’t share “the numbers”) and contact my matches to see if we could work out together who our common ancestor might be.  Ancestry do offer some automated tools, they will give you a “hint” if your DNA match has the same paper trail as you, and they will also show you on a map where your matches paper trail locations are.  They also show you a list of the surnames in the tree of your match.  This assumes that the person who matches you, also actually has a public tree or has even done their family tree.  It started to emerge that many of the testing sites do have problems in that people were not necessarily sharing their trees or replying to contact.  Each of the testing sites vary in terms of what you can see or do with your DNA results.  So although I had now learnt all about understanding autosomal DNA results, I was coming up against brickwalls in terms of knowing and tracking down who these matches were, let alone who our shared ancestor was.

My knight in shining armour was a site called GEDmatch (gedmatch.com).  GEDmatch is run by volunteers and is free, it provides tools to help examine DNA matches closer, by providing detailed cM numbers and chromsome browsers to drill down to the detail of how you match someone.  Up until this point the only site I had seen a chromosome browser on, was ftDNA.  Ancestry do not have a chromosome browser and 23andMe do, but it requires your match to allow you to see their DNA results first.  To be able to use GEDmatch I needed to send them (upload) my DNA results – this was actually very simple.  Each of the big 3 testing sites allow you to download your DNA file, and GEDmatch is setup to take this download file and process it into their database – it is just a simple matter of registering on the site and going through their upload process (follow the links on their home page – you need to register first).  Although by this stage I had test results at each of the big 3 sites, it is only necessary to upload one of your files (for the purposes of GEDmatch they are all identical).  Speaking of testing on all sites, ftDNA site (www.familytreedna.com) offers an “autosomal transfer” – this means that if you have a test on ancestry you can upload it to ftDNA and they will process it into their database as if you had the test there.  There is a cost for this but it is very small in comparison to actually doing their test.  This means you can be in both the Ancestry and ftDNA database for the price of 1 test and a small additional fee.  A bargain!   I tested with 23andMe directly as I was becoming more and more interested in DNA and wanted to see how their tests and site worked.  They have some have fun additions to their results including your Neanderthal percentage and other various things. 23andMe also offer heath testing (not available in all countries).  I found this very interesting but will not be covering that in this blog post.  All I would say is if you have any questions or concerns regarding health reports from your DNA test, then you need to discuss them with health professionals and not genealogists.  If you want health results and have not tested with 23andMe, there are other sites available – you can google for them, but Promethease is a suitable alternative and you can upload your DNA file for a comprehensive health result for a very small price.

So back to GEDmatch – after a few days (although now it can sometimes only take a few hours) my data was processed and able to be analysed in a “‘One-to-Many’ matches” report.  You can find this and other reports on the GEDmatch home page once you have registered and logged in. GEDmatch is more of a technical site than perhaps the nice user friendly interface of the testing sites themselves.  There are some very good blogs about using GEDMatch but I will cover some of the tools I’ve used in the most simple way I can.   There are 2 reports I’m going to cover – these are the ones that you will use the most when you first get started.  The first report to run is the One-To-Many matches report.  When you click on it the page opens and looks a bit complex to start with – but there is only one field you need to fill in and that is your KIT number.  A KIT number is assigned to you by GEDmatch when go through the process of uploading your DNA file.  You can upload files from any other relatives that you have tested and uploaded as well – they will all be available on the front page when you log in – each with their own KIT number.  I now have both my maternal uncle and my son on GEDMatch [edit: Jan/2016 – I know have mother, sister, and several uncles on GEDMatch – I am truly addicted].  I’ll explain why you might want to test more family members  – when you see your matches, you are unable to tell where they might match you, is a cousin on your fathers side or your mothers side?  There is no way that the test can work this out for you.  If you tested your mother for example, you would see that many of your cousin matches are also her cousin matches.  (when I say cousin it may be first, second, third, 2nd 1 times removed etc…).  When you looked at your mothers matches you know that that match comes from her side of the family.  You can safely assume that the majority of matches that don’t match her must be from your fathers side.  (I say the majority, because there could be several matches where the cM numbers are low and these may be coincidental that you match them, rather than because you have inherited that particular DNA from a common ancestor).

GEDmatch reports can be quite confusing to start with as they look like spreadsheets with lots of numbers.  For an excel fan like me that was quite exciting, but I imagine that anyone who may not be so technical or mathematical, it might look hideously confusing.  So lets breakdown what we actually see on these reports.  By running a GEDMatch many to one report, you will get a list of 1500 matches – these are people that match your DNA and have also uploaded to GEDMatch.  You can tell where these people did the actual test as the first column lists their kit number – a kit starting with A means it was an ancestry test, F for ftDNA and M for 23andMe tests.  This can be useful to know if you are researching someone and want to try and look for them on the site they originally tested on.  Especially useful if you have tested on the same site as them.  Going across the page of the GEDmatch report you want to look across for those 2 important numbers – total cM and largest cM.  GEDmatch will sort the report with the largest shared cM at the top.  I like to re-sort the page by the largest cM, and it can be done by just clicking the little arrows in the top of that cell where the column name is.  The next step I took when I first ran my reports, was to download this list of people to an excel spreadsheet.  You can do this by just highlighting them all on your computer and copying and pasting them to a spreadsheet.  The reason I did this was because they were now on my own computer, but also it meant I could add some columns to write notes next to each match, or I could highlight the names I knew, or the names I was more interested in researching.  Because I couldn’t possible research 1500 names, I extracted the top few matches and put them into a new sheet – I called these my TOP MATCHES.  I’ve since added to my top matches by adding matches from the test sites (although several of them were already in GEDmatch, so I only added those that weren’t – and again kept it only to the closest relationships).  My Top Matches list is where I spend all my time now – I’m only working on matches that have the closest match, I’m not bothering (at the moment) with all the more distant cousins.  It’s worth noting that 2nd cousins mean shared Great Grandparents, 3rd cousins means shared 2xG Grandparents and so on.  Everyone has 4 grandparents, 8 G Grandparents, 16 2xG grandparents, 32 3xG grandparents and 64 4xG Grandparents – however this would be different if you have cousin marriages in your tree).  It’s estimated that everyone has approximately 4,700 5th cousins, that is a lot of cousins that you potentially could get DNA matches with.  It’s most likely that your paper trail family tree has many gaps in it and that you have not documented your family at this level right out through all branches.  After many years of research I know that I do not have all that information, and some information I do have is quite sketchy due to unavailable records and incorrect information.  It’s believed that we all have 2 family trees.  Our documented genealogical family tree and now our genetic biological family tree.  Whilst these may be the same, there is definitely room for them to be different, with illegitimate births, unknown parentage, adoptions, and the like.  DNA testing is bringing several family secrets out into the open.  You need to be prepared that you may find out things that were previously hushed up.

I’ve sidetracked from the GEDmatch report but once you’ve chosen the matches that you want to work on, you can delve further into GEDmatch (if you have a match on a testing site that is not on GEDmatch, but you are communicating, then it is well worth suggesting they also upload their file to GEDmatch so that you can investigate your matching data more closely).  The one-to-one report in GEDMatch is useful once you’ve found the matches you want to look more closely at.  It’s very simple to run by clicking on the one-to-one link, just put in your kit number and their kit number.  Do not change the defaults for now.  GEDmatch will bring back a report that shows where on your chromosomes you match and will also give an estimate of where the most recent common ancestor (mrca) is – ie how many generations back (your parents are 1 generation, grandparents 2 generations, etc).   The first screen (where you put your kit number) defaults to 7cM – this is important, because it is possible with autosomal DNA tests that you will match up with “false positives”.  It means that you may have DNA in common, but it is just a coincidence – this is called IBS (Identical By State) in comparison to IBD which means Identical By Descent.  Matches with shared cM greater than 10cM are 99% likely to be a true match (ie you have an ancestor in common within the last 5-6 generations), I usually only look at matches where the largest cM is greater than 15.  Remember that the higher the number, the closer that common ancestor is.  GEDMatch uses the 7cM as a default, you can lower that when running the one-to-one report but be extremely careful about making assumptions around low cM numbers.  In endogamous populations – the DNA results are a little different – and there are matches with a high shared cM but when you look at it on the one-to-one screen, it is split over all the chromosomes, that it is not really a close match.  Descendants from an endogamous population (such as Ashkenazi Jewish populations) face additional challenges to understand their match and find their most recent common ancestor, as usually the match looks (based on the numbers) as being much closer to you than they really are.
On my excel spreadsheet of “top matches” I record the largest cM so that I am sure my top matches are true top matches.  It is possible for someone to have a high shared cM but when you do a one-to-one match it is split up across many chromosomes, it is unlikely that this match is therefore as close as the total cM might suggest.  I usually dismiss these matches and move them to the bottom of my list.   A good match will be one with a long segments of shared DNA – and the common ancestor will be quite close.  It’s hard not to get too hung up on the numbers, but they are useful so that you don’t spend endless hours trying to track an ancestor that is so far back it is probably beyond your paper trail – and perhaps beyond any paper trail you could realistically do.  I make it a rule to only stick to the top matches and not get side tracked on smaller matches just because they may seem to have a similar surname.  Some surnames are common, just because you share one does not automatically mean you are related (in the past 5-6 generations).  Only the DNA test can tell you if you are related – and it’s important to note that not all of your ancestors will match your DNA, if you remember DNA is randomised before you inherit it, and it you only get 50% of each parent.  A cousin may get very different bits of DNA than you do, in fact it is theoretically possible for full siblings to have a very low or no match, if they get the opposite 50% from their parent than you do.  Therefore you will have 3rd-4th cousins that may not show up as matches in your test results.
I’m not going to go into detail of the GEDmatch report – there are many more options that you can use and many sites to guide you – the best way to learn is to just play around with them.  In GEDMatch click on the links in their “Learn More” box on the home page.  However it is most important to use the one-to-one match report for all the matches you research, as this will tell you if they are a real match or not.  The chromosome browser and other tools are important when we try and “cluster” (also called “circle”) people together and “triangulate” our results.  This means we need to group together the matches that we have.  Why do this?  because you can see who matches who (has matches in common with each other) and you only need one of them to have more detail and you make the link, and the others will start to fall into place.  For example if you have 3 people that all match each other and match you – and you know one of them is your from your paternal side of the family, the you know they are ALL from your paternal side of the family.   If one of those matches is confirmed and the others also match the confirmed one and you, then you have triangulated a match!  There are some things to watch out for when doing this matching – one of the common mistakes is to assume that everyone who matches you on a particular chromosome, also match each other.  It seems logical until you remember that we have two of every chromosome – SO matches may all match you on, say chromosome 16, but they do not match each other – that is because some matches will be on your paternal chromosome 16 and some will be on your maternal chromosome 16.  This is why you must always do a one-to-one match of everyone – look for tools called matrix match or similar on your testing site.
What to do next.  So you’ve found some matches that look quite good – ie they have a high total shared cM and they match on a long segment of a chromosome.  You now ready to start researching this match.  The easiest method is to just try contacting the match – the sites all give you some ability to do this, and GEDMatch will show the matches email address.  However be prepared – your match may know a lot less than you do.  I’ve had people contacting me saying “we are a match please tell me who our common ancestor is”.  Why would they think that I know more than they do.  Please be aware of this if you are contacting your matches, they will most likely know a lot less than you.  A good outcome would be that they are able to share a family tree with you and some information for you to work on together.  In my experience matches that are able to do that are in the minority (if they even reply to you).  That means you need to take on the research if you are to track down this common ancestor.  Here are the ways that I go about this …
  1. I immediately start a family tree with my match at the bottom of the tree – I use ancestry for this – I set up a new tree and make sure it is private (and also go the extra step to stop it being found in searches).
  2. Look through the profile of the match – can you find them on sites other than ancestry, can you find clues as to their parents or grandparents?    Google their email address this often pops up with some ancestry type sites they have listed their details in.  If in Ancestry you can search their member profile for any hint of a surname they might be searching for.  In ancestry sometimes it says they have no tree, but if you look their is one they have just not linked to it.   Google their name (if you can figure out their actual name) – or just google their profile name.  You are looking for any hints (obituary pages, research pages, social media sites, 192 type sites etc) where they mention their full name or better yet parents/grandparents name.  Fill in their tree with what you find.  Sometimes you only need 1 generation in a tree to start finding more – especially if you are on Ancestry as you start to immediately get hints to build up the tree.    Hopefully this leads straight to a common ancestor.
  3. Look at a “cluster” of people – those all matching on a particular chromosome and are confirmed to all match each other.  If you can find a link to one, you should find them all fitting into place.  I do this work all on my excel spreadsheet – but there are other tools to help you do this – two of the tools that spring to mind are GenomeMate and DNAGedcom.  These sites have applications that help effectively manage your top matches and keep track of where and how they all match you and each other.  I use a spreadsheet because I like excel and I use it a lot in my worklife, so I have built my own chromosome browser.  You don’t have to do this (and you probably shouldn’t) – you can use sites that have applications already set up to use.
  4. Rule out where the common ancestor CANNOT be.  This will stop you searching your entire tree to find a common ancestor.  If you have someone marked down as a paternal match, then anyone matching that person should also be a paternal match (there will be exceptions and keep an eye out where one-to-one matches show something you weren’t expecting).  Keep your family charts in front of you when doing this.  There is an additional test here that can also help you – and that is the results from the X-Chromosome.  The X Chromosome is 1 of the pairs of sex chromosomes that come from your parents – 1 from each.  Your mother (regardless of your sex) will ALWAYS pass down her X-Chromosome (which will be a randomised version of her 2 X chromosomes from her parents) and your father will pass down an X Chromosome if you are a female and a Y Chromosome if a male).  Although DNA inheritance on the X Chromosome is somewhat different than the other 22 pairs of autosomes, it can be used to try and rule out some areas of the family tree that the match CANNOT be on, given a male will not get an X match with his father.  So if you are male and share matching DNA on the X Chromosome, you know that match cannot be on the paternal side and you share a common ancestor on your mothers side.  GEDmatch does show X matches and the amount of DNA shared – you can use the X match information for this purpose, but be aware of trying to use for anything else.  Due to the inheritance pattern of X DNA the match may have come many many years ago (before paper trail genealogy timeframes).
  5. Test more of your family – it really is helpful to have another family member to test, if possible your mother or father.  Their matches will also go back 5-6 generations so this will give you additional matches another generation back.  Testing the oldest living relative as soon as possible, should be on all of our ToDo lists!.  There is limited use in testing a sibling, as they will be have similar matches to you (although some additional due to DNA inheritance – ie. they will have different bits of your parents).  A relative who is paternal or maternal is definitely going to help you sort matches into paternal/maternal, once you can establish which side your match is from then it immediately removes 50% of your tree you need to search for the most recent common ancestor.  However whilst you can sensibly conclude that anyone who does not match your mother MUST come from your fathers side, you can not make the same assumption with an uncle or aunt.  They will have different bits of DNA from your grandparents than you do.  You only have approximately 25% of their parents DNA and you cannot tell which percentage you have.
So that concludes this blog – it is not meant to be technical or scientific (maybe my next blog will be) , but it’s to help anyone who is just starting out with DNA testing and faced with a load of information that makes no sense and seems like an impossible task to sort out.  My biggest piece of advice would be to take it one step at a time and learn as much as you can as you go.  Don’t assume things, and ask for help if you need it.  If you spot any glaring errors, please let me know and I will fix!
I was going to mention Ethnicity.  I know a lot of people immediately start thinking about ethnicity when they do a DNA test, but ethnicity testing should be taken with a grain of salt.  Each site that has ethnicity tests (sometimes called admixture) – use a different “reference population” to compare your results to.  Different sites also lump together various areas/locations, you need to look at how each site determines it’s reference population and make a call on what you agree with in terms of your ethnicity.  Some sites will change your ethnicity results as more people test and more information becomes available about peoples ancestry.  Ethnicity results are not useful to locate common ancestors, the results are your ethnicity from what could be thousands of years ago.
Good luck with your searches and matches – the more you learn, the more you’ll get the most out of your DNA test results.  Don’t be afraid to ask for help, it is new for all of us!
My cuppa has gone cold, time to put the kettle on again…
If you didn’t get here via our facebook group – then it might be worth knowing that I help admin a facebook group that assists people with DNA testing for family history purposes – you can find our group from the link below, please feel free to join up if you have tested or have thought about testing.

 

Send to Kindle

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

18 thoughts on “Put the kettle on, lets sit down and talk about DNA (*for newbies only)”