Friday, 2 October 2020

Interpreting Group 2 - the Ryan's of Tipperary

With 108 members, Group 2 is the largest group within the Ryan DNA Project. Let's assess this group in relation to the 11 questions discussed on our Interpreting the Results page, namely: 

  1. Where is the group from? 
  2. Where did the name arise?
  3. Where do they sit on the Tree of Mankind?
  4. Who are their nearest genetic neighbours?
  5. Does this give any clues to their origin?
  6. Does this fit with the history of the surname?
  7. How long have they carried the surname?
  8. Is there any evidence of a Surname or DNA Switch (SDS)?
  9. Is there any evidence of Chance Matches?
  10. What is the branching structure within their group?
  11. When was each branch formed?

But before we even start analysing any genetic group, we have to ask ourselves: has the group been grouped correctly? And questions that follow on directly from that include: is there anyone in the group who shouldn't be there? and is there any evidence of chance matches? 

We can even take a big step back and ask ourselves: why group people together in the first place? Let's address that question first ...

The purpose of the project is to study the origins and evolution of the Ryan surname, so this creates the first criterion for grouping people together - group members should be men with the surname Ryan. 

We know that surnames arose in Ireland about 1000 years ago (mainly between 950-1150 AD) so this gives us a time cut-off for any group within the project i.e we want to group people together who are likely to be related within the last 1000 years or so. Conversely, we want to exclude people from groups if their connection to the others in the group is likely to be more than 1000 years ago.

But as well as including people with the surname Ryan, we also want to include people whose ancestors carried the Ryan surname but for some reason there was a surname switch somewhere along the line. These people would be Ryan by DNA but would now bear a different surname. This allows us to include people with non-Ryan surnames who appear to be close genetic matches to others within the group.

So, to summarise, people are grouped together primarily on the basis that a) they are Ryan's, and/or b) that they are relatively close genetic matches to each other, and/or c) both. Genetic closeness can be determined on the basis of a number of different criteria that I call Markers of Potential Relatedness. The most useful of these are based on Genetic Distance, Unique STR Pattern, Rare Marker Values, and SNP testing. You can read more about them in this blog post here.

Therefore, each group within the project should contain Ryan's, or people who are genetically Ryan, and all of whom are related within the last 1000 years.

Because many Irish surnames have multiple different origins, we would expect many surname projects using the above criteria to have several distinct genetic groups, all less than 1000 years old. And we see this in the Ryan DNA Project - there are currently 10 genetically distinct Ryan groups within the project and each individual group appears to be less than 1000 years.


So now let's return to the initial questions and apply them to Group 2: has the group been grouped correctly? is there anyone in the group who shouldn't be there? Is there any evidence of chance matches?

And the short answer is: yes, the group appears to have been grouped correctly ... for the most part. Using 37 marker data (100 participants), 81% of the group members are within a Genetic Distance (GD) of 4/37 to the average STR signature for the entire group (the modal haplotype). A further 17% are within a GD of 7/37. The proportion for each GD is summarised in the table below.

GD

0

1

2

3

4

5

6

7

>7

n

7

21

17

16

20

9

4

4

11

So this appears to be quite a tight knit group. And this is exemplified further when we look at this group on the public Results Page of the project. You can see that the project members most closely related to the modal haplotype gather towards the centre and those with the most mutations (in pink & purple) are towards the top and bottom of the results table, including non-Ryan outliers (some of whom might be Ryan by DNA).

Results table showing the first 37 markers (mutations in pink & purple)
click to enlarge

There are a few people who shouldn't be in the group and many of these have been left there for comparative purposes when I was corresponding with the individual project members concerned. These can now be moved out of the group (I'll leave them in for the next week or so). These people can usually be identified by a noticeably different STR signature (i.e. string of numbers) on the public Results Page as evidenced by many more mutations than usual (in pink & purple), and / or a non-Ryan surname (marked by a purple line above), and / or an incompatible Terminal SNP (marked by a red line in the diagram above). 

The latter is particularly interesting as it indicates "chance matches" i.e. STR matches that appear to be closely related but are not. Most of these chance matches have non-Ryan surnames so the question arises: was there a surname switch on their direct male line? or are these pre-surname matches who just happen to approximate the same STR signature of the Ryan's of Group 2? These questions will be discussed further in the answer to Question 9 below. An important consideration is the fact that there are 7 of them within the group, which suggests that 7% or more of the group may be incorrectly grouped together - however, as these are non-Ryan participants, such chance matches may be easily identified. 

But now that we have ascertained that the majority of the group members probably have been correctly grouped (and we have identified and made a note of those who may not belong), we can proceed to analyse what this particular grouping tells us by addressing each of the questions listed at the start.


1. Where is the group from?
2. Where did the name arise?
These questions are best addressed by assessing the birth locations of the MDKAs listed for each member of the group (MDKA, Most Distant Known Ancestor). Birth location can be the county where the MDKA was born, or the town or townland (if known). Unfortunately not everyone has provided this vital information - only 29 out of 108 participants have done so, and among these only 10 have included a town or townland.

Nevertheless, there is a clear signal that most participants who have provided the information have MDKA origins in Tipperary. Here is the breakdown of MDKA locations:

  • Tipperary ... 12
  • Limerick ... 3
  • Clare ... 1
  • Kilkenny ... 1
  • Wexford ... 1
  • USA ... 11

This is consistent with Surname Distribution Maps from the mid-1800s which clearly indicate a particular concentration of the Ryan surname in county Tipperary.

Griffiths Valuation (1848-1864) indicates the Ryan surname
was most common in Tipperary and surrounding areas
(from https://www.swilson.info/sdist.php)

So based on this data, we can be relatively confident that the origins for Group 2 are probably in or close to Tipperary.


3. Where do they sit on the Tree of Mankind?
4. Who are their nearest genetic neighbours?
5. Does this give any clues to their origin?
6. Does this fit with the history of the surname?


Several group members have undertaken the Big Y test. This assesses over 200,000 SNP markers on the Y-chromosome and 851 STR markers, thus providing a considerably greater degree of fine detail than the standard Y-DNA-37 test (which only assesses 37 STR markers).

The results from the Big Y tests clearly place Group 2 on the M756 branch of the Tree of Mankind. M756 is the overarching SNP marker for the Ryan's of Group 2 - all Group 2 Ryan's are M756+, and most M756+ men are Ryan's. The few non-Ryan men in Group 2 may be due to a Surname or DNA Switch ... but then again, maybe not (this will be discussed later).

The SNP Sequence associated with M756 is detailed below. This is the sequence of SNP markers that characterises each branching point on the Tree of Mankind, starting "upstream" at the level of the Haplogroup (R in this case) and progressing all the way "downstream" (i.e. towards the present day) to the Terminal SNP. Think of this string of SNPs as a line of ancestors coming forward in time towards the present day. The SNP Sequence for the M756 branch is as follows:

  • R-M269 > L23 > L51 > P310 > L151 > P312 > ZZ11 > DF27 > ZZ12_1 > FGC78762 > ZZ19_1 > Z31644 > BY2285 > Y5072 > Y5077 > Y5058 > Y5061 > FT44669 > M756
This tells us that the Group 2 Ryan's belong to Haplogroup R. All of humanity is broadly divided into 20 different "haplogroups" (i.e. groups of people with a broadly similar genetic signature) and Haplogroup R is the most common haplogroup in Western Europe. Furthermore, the Group 2 Ryan's sit on the DF27 sub-branch, which appears to have migrated into the Iberian Peninsula some 3000-4500 years ago before moving northward into Britain & Ireland. You can read more about these ancient migrations on the Eupedia website here.

Today, DF27 is most highly concentrated in Spain and southern France
(from https://www.eupedia.com/europe/Haplogroup_R1b_Y-DNA.shtml)

Crude approximation of the migratory route of our M756 ancestors
from Africa (>200,000 years ago) to Ireland (c.1000 years ago)
with estimated locations where downstream SNP markers arose along the way
(from http://scaledinnovation.com/gg/snpTracker.html?snp=R-M756)


There are several different versions of the Tree of Mankind (Y-Haplotree), each with their pros and cons. There are also minor differences between the trees in how they interpret where some branches sit in relation to each other. The versions I use most often are FTDNA's Big Y Block Tree (it has the most SNPs), the Big Tree (it has the most surnames), and the YFULL Haplotree (which is the only one with dates for each branch).

The Big Tree - the M756 branch is on the left 
click to enlarge

The Big Tree diagram above shows that there are currently 19 people sitting on the M756 branch or one of 5 branches below it. This number is likely to increase as more people do the Big Y test and upload their data to the Big Tree. In addition to Ryan's sitting on or below M756, we also have people called Smith, Whelan, Cannady, Kennedy, Ellis, Butler & Dwyer. Furthermore, the closest genetic neighbours to the Group 2 Ryan's are people called O'Dwyer / Dwyer & Foley, as well as Keogh, Lynch & Sexton (see the larger diagram here). Surname Distribution Maps of these genetic neighbours indicate that the highest concentrations usually occur in Tipperary (Whelan, Kennedy, Butler, Dwyer) or the neighbouring counties of Cork (Ellis, Foley, Lynch) or Wexford (Keogh). 

All this evidence points to the fact that the M756 branch and its neighbouring branches probably have origins in and around Tipperary (which is also consistent with the MDKA locations discussed above).

Surname Distribution Maps (mid-1800s) of genetically neighbouring surnames
(from https://www.swilson.info/sdist.php)
click to enlarge

O Hart comments that "O'Dwyer and O'Ryan, [were] chiefs in Tipperary ... O'Ryan and O'Felan were ancient families of note in Kilkenny, as well as in Carlow, Tipperary, and Waterford". The surname Whelan is derived from Felan.

A Tipperary origin for M756 is in keeping with the history of the Ryan surname. Woulfe and MacLysaght describe how there are several distinct origins for the Ryan surname. The two most prominent Ryan groups both arose in the province of Leinster and both are supposedly descended from Cathaoir Mór, King of Leinster in the second century AD. The largest of the two groups (the Mulryan's) migrated en masse to the Tipperary/Limerick border in the 1200s-1300s, whilst the other group (the Ryan's of Idrone) remained in and around Carlow. Of note, the Mulryan surname was shortened to Ryan and the original surname has largely disappeared. The Group 2 Ryan's could represent the descendants of both these groups and if this is the case, we should see two separate groups, each with a common ancestor about 1000 years ago, but also a common connection between the two groups about 2000 years ago. This will be explored further in the answer to Question 9 below.

Furthermore, in his pedigrees for the various Ryan groups, O Hart indicates that the Mulryan's and the O'Dwyer's share a common ancestor in Cu Corb (no. 85), who was born about 2000 years ago. The O'Dwyer's sit on a neighbouring branch of the Tree of Mankind to the Group 2 Ryan's and (depending on which version of the Haplotree you consult) their common ancestor would have carried the SNP marker FT44669 (Big Y Block Tree) or two SNPs below that (Big Tree). Either way, we will see in the following section that this common genetic ancestor would have lived some time around 950-1100 AD which is a lot later than the common ancestor described in the ancient genealogies. So although there is some evidence of consistency between the DNA record and the traditional genealogies in terms of the related surnames, there appears to be a mismatch in terms of the dates for when the common ancestor lived.


7. How long have they carried the surname?

Let's remind ourselves again of the SNP Sequence for the M756 branch:
  • R-M269 > L23 > L51 > P310 > L151 > P312 > ZZ11 > DF27 > ZZ12_1 > FGC78762 > ZZ19_1 > Z31644 > BY2285 > Y5072 > Y5077 > Y5058Y5061 > FT44669 > M756
YFULL provides the following TMRCA estimates for the key SNP markers highlighted above. TMRCA stands for Time to Most Recent Common Ancestor and is expressed as years before present (ybp):
  • M269 ... 6400 years before present (ybp)
  • DF27 ... 4500 ybp
  • BY2285 ... 4400 ybp (13 samples)
  • Y5072 ... 4400 ybp (9 samples)
  • Y5058 (Y5061) ... 1150 ybp (7 samples)
  • M756 (PH5187) ... 1150 ybp ( 2 samples) 

If we assume that M756 is the SNP marker carried by the Ryan progenitor, then (according to the YFULL age estimates) he would have been born about (2020-1150=) 870 AD, which would have been before the time that surnames were introduced into Ireland, which is roughly about 1000 years ago (most Irish surnames arose between 950-1150 AD).

The trouble is not everyone has uploaded their Big Y data to YFULL and their estimates become increasingly more crude as one travels more "downstream" because there are increasingly fewer people (and hence data) as the branches become finer and finer. In fact, the TMRCA estimate for M756 is based on data from only 2 people ... and should therefore be taken with a huge pinch of salt. 

Not only do the estimates become less reliable, the range around these estimates gets wider and wider. Here are the same TMRCA estimates as above but with the 95% Confidence Intervals around each estimate included. Look how the percentage increases (at the end of each line):
  • M269 ... 6400 years before present (ybp) ... 7100 - 5700 ybp (i.e. +700, -700 = +/-11%)
  • DF27 ... 4500 ybp ... ... ...  5300 - 3700 ybp (i.e. +800, -800 = +/-18%)
  • BY2285 ... 4400 ybp ... ...  5300 - 3600 ybp (i.e. +900, -800 = +20%, -18%)
  • Y5072  ...  4400 ybp  ... ...  5300 - 3600 ybp (i.e. +900, -800 = +20%, -18%)
  • Y5058 (Y5061) ... 1150 ybp ...  1900 - 700 ybp (i.e. +750, - 450 = +65%, -40%)
  • M756 (PH5187) ... 1150 ybp ... 1900 - 700 ybp (i.e. +750, - 450 = +65%, -40%)
When we see the ranges around each of these estimates, we can really appreciate how crude and inexact these estimates are. But this is what happens when we are dealing with small amounts of data.

However, we can attempt to replicate the YFULL methodology using the more comprehensive SNP data from FTDNA's Big Y Block Tree.

The M756 branch on FTDNA's Big Y Block Tree
click to enlarge

On the Big Y Block Tree, there are currently 26 people sitting on M756 itself or on one of 6 branches below it (remember, YFULL only has 2 people). The diagram above shows the number of SNPs in each of the SNP Blocks that characterise each of the branches, as well as the "Private Variants" (i.e. unique SNPs) that are only found (currently) in specific individuals (this too will change as we get more Big Y results). M756 is the "common ancestor" for all 26 people and the TMRCA calculations are detailed in Footnote 1.

Using this method, the TMRCA works out as about 1200 AD (using a conversion factor of 83 years per SNP) or 1300 AD (using 70 years per SNP). I still need to calculate the range around each of these estimates but it is likely to be somewhere in the region of +/- 300 years, which will give us a time period extending from 900 AD to 1600 AD.

This is still a very wide time span. And it tells us that even with more data, the estimates will remain crude and we will never be able to pinpoint an exact date (which is what we want to do as genealogists). It is tempting to take the central estimate (midpoint estimate) and focus on that, but there is a huge risk of wishful thinking or Confirmation Bias (e.g. "that matches my expectations exactly so it must be true" ... not so).

Using this method of SNP Counting, we arrive at the following crude TMRCA dates for the ancestral branches above M756 (see Footnote 2 for the actual estimation process):
  • Y5077 ... ... 2000-1400 BC
  • Y5058 ... ... 500-750 AD (31 SNPs in block)
  • Y5061 ... ... 950-1100 AD (5 SNPs in block)
  • FT44669 ... 950-1100 AD (1 SNP in block)
  • M756 ... ...  1200-1300 AD (2 SNPs in block)

The Big Tree diagram for this portion of the Tree of Mankind tells us that most Ryan's fall on or below the M756 branch ... but above that point in the Tree, we begin to see clusters of other surnames (e.g. Dwyer below FT44669, Cosgrove below Y5061). So this suggests to us that the time that surnames arose (about 1000 years ago) falls somewhere between M756 and the branch immediately upstream (FT44669) or the one above that (Y5061). And this approximates with our crude TMRCA estimates for these branches, so we are probably in the right ball park (Confirmation Bias notwithstanding).


We could refine the TMRCA estimate even further by using the SAPP Programme (developed by Dave Vance). This combines SNP markers, STR markers and known pedigrees to generate the TMRCA for the group overall but also for every branching point within the "genetic family tree" for Group 2.  And that could be a topic for a subsequent blog post. 

Nevertheless, whether we use 83 years per SNP or 70 years per SNP, the current common ancestor of everyone in Group 2 is estimated to have lived sometime around 1200 or 1300 AD. This is about 200-300 years after the introduction of surnames ... which raises the question: what happened before that? Was the Ryan surname associated with the same DNA signature prior to 1200? or was there a surname switch prior to that and the DNA was associated with some other surname?

At this stage, all we can say with any degree of confidence is that Group 2's Ryan surname has been associated with the same Y-DNA for the past 700-800 years approximately ... but we don't know if the two were associated before that.

A further point to consider is that the M756 branch is characterised by 2 SNP markers - M756 is one of them and the other is PH5187. As more people test, we may find that some individual is positive for one but not the other. This would have the effect of splitting the M756 branch in two, with one SNP being the parent of the other. This would push the age of the parent branch back by about 83 years (or 70 years if we use the alternative conversion factor). This would still be shortly after the time period of surname formation and so would not substantially alter the analysis conducted thus far.


8. Is there any evidence of a Surname or DNA Switch (SDS)?
9. Is there any evidence of Chance Matches?

This is more a consideration when grouping people together in the first place, prior to the analysis phase, but it is worth discussing here as it highlights some of the difficulties we face in deciding whether or not non-Ryan members belong in Group 2.

Theoretically, your STR matches can be divided into 4 main categories based on whether or not they share your surname and whether or not they are related to you within the past 1000 years or so. This is illustrated in the diagram below. Ideally, I would want to include everyone within Boxes A & B within a particular genetic group, and exclude the rest (Boxes C & D). As regards those in Box C, these would indeed be Ryan's but I would put them in a separate genetic group.

There are two phenomona that complicate our ability to classify matches into these four categories. One is the Surname or DNA Switch (SDS, or simply "surname switch") and the other is Convergence. The latter phenomenon makes it particularly difficult to decide whether to allocate non-Ryan matches to Box B (Surname Switches) or Box D (pre-surname ancestors). 
The classification of STR Matches

An SDS or surname switch is when the surname changes from one particular surname in the father to a different surname in the son. There are many reasons for this and apart from the obvious ones of adoption, illegitimacy & infidelity, switches in Irish surnames may often by due to an ancient switch in clan allegiance (e.g. 1000-1400 AD) or anglicisation of surnames (e.g. 1500-1700 AD) from the native Irish version to a (frequently corrupted) English version. 

On the project's Results Page, non-Ryan surnames include Foley, Stone, Ellis, Kennedy, Smith, Fuller, Perrigan, Butler, Cannady, Leonard, Leger, Harrigan, Carroll, Meehan, & Dwyer. Overall the proportion of non-Ryan surnames within the group is 27 out of 108 (about a quarter). But are these surname switches (Box B) or matches that share a pre-surname ancestor with other Group 2 members (Box D)?

There is clear evidence that several of these non-Ryan surnames are probably the result of a surname switch, because this has been documented or suspected within the family trees of the individuals concerned (e.g. the Perrigan, Reed, Leger & Harrigan participants have all listed a Ryan as their MDKA). However, many of the other non-Ryan individuals may have had a surname switch long before the timescale of surviving documentary records. In Ireland records become very thin prior to 1800 so that leaves 800 years unaccounted for between 1800 and the introduction of surnames (about 1000 AD). Nevertheless, all these people would be Ryan by DNA and would fall into Box B (top right) in the diagram below ... if it can be established that they are related to the group within the last 1000 years. But that can be easier said than done.

For some of the non-Ryan members in this group, it is difficult to know whether they share a common ancestor with the rest of the group before or after the time that surnames were introduced. If the common ancestor was after the Ryan surname was introduced, then these are possibly surname switches and belong in Box B above (i.e. they started off as Ryan's and the surname was switched to something else). If the common ancestor was before the Ryan surname was introduced, then these men might share a pre-surname ancestor, some of whose descendants became the Group 2 Ryan's whilst others evolved to form a separate clan and took a different surname - these men just happen to match Group 2 Ryan's by chance and would belong in Box D above.

Either way, all of these men have an STR signature that is similar to that of Group 2 members - the question is: which of them are True Matches and which of them are Chance Matches? The reason why some men happen to match the Group 2 STR signature by chance is due to a technical phenomenon known as Convergence. This occurs when two or more STR signatures mutate in such a way that over time they grow more similar to each other (as opposed to more different). They begin to approximate each other. And this can reach the stage where they may even be declared as "matches" to each other (using FTDNA's criteria for matching). In the extreme situation, two people may appear to share a common ancestor within the last few hundred years but in fact are related thousands of years ago. 

One practical implication of this is that people may be grouped together under the belief that they are all related less than 1000 years ago when in fact the common ancestor is much further back in time, prior to the formation of surnames, and therefore the people should not be grouped together at all. 

One way of assessing if Convergence is likely to be present is to look at the number of STR Matches for individuals within any given genetic group. Convergence is likely to be present if ...
  1. there is an excessively high number of matches on your STR Match List, and
  2. most of whom have completely different surnames to your own
Within Ryan Group 2, there is evidence of Convergence but its influence varies from person to person. I chose a SNP-tested Ryan from the top, bottom and middle of the results page and the table below summarises the findings. The number of matches varied with the level of comparison - in general people tend to have excessive numbers of matches at the lower levels (12 & 25 markers) but this is not always the case. Convergence can result in chance matches even at the 111 marker level. 

Thus, for example, the second participant below has 43 matches at the 111 marker level - some of these are likely to be true matches but many of them will be chance matches. He thus has a mixture of both, but the chance matches probably predominate. Further evidence of this is the number of Ryan's among his matches - 18 altogether, which is 42% (18/43) of the total. His Ryan matches are more likely to be True Matches and the non-Ryan's are more likely to be Chance Matches (but some of the latter may be legitimate surname switches that are Ryan by DNA but not by name).

Numbers of STR Matches for three Group 2 members

As an example of probable Convergence in a non-Ryan member of Group 2, one of the Cannady participants has 223 matches at the 37-marker level of comparison (46 of whom are Ryan's = 21%), 193 matches at the 67-marker level (49 of whom are Ryan's = 25%), and 54 at 111 markers (24 of whom are Ryan's = 44%). The high number of matches at each level suggests that there is a greater likelihood that his Ryan matches are simply chance matches ... but only SNP-testing can help answer this question definitively (as we shall see below).

So what are the practical implications of these findings? It is difficult to tell if someone is related within the last 1000 years (and therefore belongs in the group) or if they are not related within the last 1000 years (and therefore should be excluded).

The only way to distinguish between a True STR Match and a Chance STR Match is to do SNP testing (preferably with the Big Y test). Several non-Ryan members have done so but unfortunately, for most of them, the results have failed to distinguish if they are surname switches (Box B) or pre-surname matches (Box D). Below are their SNP Sequences (with the Group 2 sequence at the top for comparison) ...
  • Ryan ... ... . R-Y5058 > Y5061 > FT44669 > M756
  • Carroll ... .. R-Y5058 > Y5061 > FT44669 > BY61861 > FT79210
  • Foley ... ...  R-Y5058 > Y5061 > FT44669 > FT164983 > BY19912 > BY19125 > BY42892 > FT144147
  • Parsons ... R-Y5058 > Y5061 > FT44669 > FT164983 > BY19912 > BY19125 > BY42892 > BY67158 (his MDKA is an O'Dwyer)
  • Kennedy ... R-Y5058 > Y5061 > FGC22222
  • Leonard ... R-L21 > DF13 > DF21 > S5488 > Z16294 > BY11118 > Z16281 > Z16282 > Z16291 > Z16284 > FT14437 > FT19556

The Carroll, Foley & Parsons participants are in fact sitting on adjacent branches to the Group 2 Ryan's and share a common ancestor who would have carried the SNP marker FT44669 which (as we have seen from SNP Counting) would have arisen about 950 to 1100 AD. This time period occurs right at the same time as surname introduction so it does not help us judge whether the connection is pre- or post-surname.

The Kennedy participant is positioned one branch further upstream on branch Y5061, which again arose about 950-1100 AD, so again it does not help us judge whether the connection is pre- or post-surname.

In contrast, the Leonard participant is on an entirely different branch of the Tree of Mankind and the common ancestor is P312 which is 4800 years ago. This latter participant will be moved out of the group but the others will remain until we get a better TMRCA estimate for FT44669 and Y5061 and some additional clarity on whether the connection is before or after the Ryan surname was introduced.

In situations where there is a "true match" to people with non-Ryan surnames (i.e. a likely common ancestor who likely lived since the introduction of surnames), we are always faced with the question: which came first? The Ryan chicken or the non-Ryan egg? i.e. did the Ryan surname emerge from some previous surname? or was the Ryan surname the original of the species?

There is no definitive answer to this question, merely one based on the balance of probabilities ... and in this situation, because the Ryan surname is more dominant, and because the other surnames are represented by larger, and genetically distant groups of their own surname elsewhere, the balance falls in favour of the Ryan chicken (i.e. the Ryan surname came first - there was no other surname before it).

You can see an assessment of these non-Ryan surnames in Footnote [3] below.


In addition, there are some Ryan's who may not belong in Group 2 because their SNP data suggests that they may not share a common ancestor within the last 1000 years. These are people called Ryan who do not test positive for M756 but do test positive for one of the ancestral SNPs upstream. They would possibly belong in Box C of the classification diagram above (i.e. they should be categorised in their own bespoke genetic group) ...
  • 2 Ryan's test positive for Y5061 (2 steps upstream from M756) - this arose about 950-1100 AD. [2]
  • 2 Ryan's test positive for Y5058 (immediately upstream of Y5061) - 2 have MDKA locations, namely Tipperary & Waterford). This arose about 500-750 AD. [2]
  • Here is the abbreviated SNP Sequence for Group 2 as a reminder ... 
    • R-BY2285 > Y5072 > Y5077 > Y5058 > Y5061 > FT44669 > M756
I double-checked their actual data to make sure that FTDNA were reporting their Terminal SNPs correctly and this revealed that two of them had only done SNP Pack testing rather than Big Y ...
  • Y5061
    • 472126 ... 2 private SNPs ... tests negative for FT44669 and M756
    • B509695 ... 23 private SNPs ... tests negative for FT44669 and M756
  • Y5058
    • 395274 and 326943 ... these people have not done the Big Y test, just a SNP Pack. They may be positive for downstream SNPs (which would be revealed if they did the Big Y test).
The two Y5061 individuals are very interesting. The TMRCA estimate for this SNP is 950-1100 AD. [2] This is a lot later than the presumed common ancestor of the Mulryan's and the Ryan's of Idrone - he lived in the 2nd century AD (Cathaoir Mór, King of Leinster), which is about 1900 years ago. That being the case, the Group 2 Ryan's should share a SNP with the Ryan's of Idrone somewhere within the Y5058 SNP Block (31 SNPs) and should not test positive for Y5061.

So the question of how they fit into the wider Tree of Mankind remains unanswered. They may be surname switches (i.e. their ancestors were some other surname before they became Ryan's). Hopefully the answer will become clear as more people join the project and do the Big Y test. Top priority for such Big Y testing would be the closest STR matches of these two individuals. I have moved these two individuals (and their closest matches) into a new subgroup ... Group 2a.


10. What is the branching structure within the group?
11. When was each branch formed?

The Big Y Block Tree provides us with SNP-defined branching points within the "genetic family tree" for Group 2 (based on SNP data from 26 individuals). More fine-scale detail could be achieved by using the SAPP Programme to incorporate STR data as well as SNP data from all the members of Group 2 (108 in total) to generate a more comprehensive "genetic family tree" (Mutation History Tree). This would also provide more precise TMRCA age estimations for each of the branching points within the "genetic family tree". 

Knowing the branching structure gives us a better feel for the evolution of a particular group over time. In addition, estimating the dates for each branching point can give us useful information about when one branch of the surname split away from another branch and can point to migration events either within Ireland or outside of it (to America, Australia, etc). 

These trees rely on mutations within the DNA markers to build a "best fit" model of how the tree evolved over time, and therefore they are often called Mutation History Trees (i.e. akin to Family History Trees ... except they use mutations instead of ancestors).

However, this is an exercise for another day. And we also have to ask ourselves: what additional information might this exercise reveal that we are not currently seeing in the analysis done thus far?

There are a few points to consider.
  • It could reveal the finer more-downstream branches within the "genetic family tree" for Group 2. This would help group members see to whom they are most likely to be related within the last few hundred years (say, since 1700) and this could help individuals focus their own genealogical research.
  • It could also help shed light on any major divisions with the Ryan family tree, suggesting certain branches may have been part of the great exodus to Tipperary in the 1200s-1300s.
  • It might also shed some light on the possibility of there being two groups within Group 2 - the Mulryan's and the Ryan''s of Idrone.

However there are also certain limitations to this approach:
  • the programme uses the available data to create a "best fit" family tree. As more data comes in, the structure of the tree is likely to change and adapt to better incorporate the new data. Thus the "best fit" tree is not necessarily the same as the actual family tree - merely a close approximation.
  • The most accurate family tree is likely to be derived from the most comprehensive data and that would be Big Y-700 data (which assesses 851 STRs and >200,000 SNPs). However, some participants have only done a 12-marker test, some 37 markers, and most have not done the Big Y test, so the available data will always be less than optimal.
Notwithstanding these reservations, the exercise is still worth doing and I will get around to it at some stage. The output may or may not add additional detail to the analysis described thus far.


Conclusions and Key Messages
  • Group 2 is a relatively close knit group with 98% of members being within a GD of 7/37 to the modal haplotype.
    • There are several surname switches clearly evident within the group and probably a lot more that are not as obvious.
    • There is also some evidence of Convergence resulting in chance matches and some of these have been identified via Big Y testing.
  • Big Y testing has revealed that Group 2 members sit on the M756 branch of the Tree of Mankind.
    • The ancient ancestors of Group 2 probably travelled into Ireland (via Britain) from southern France or Spain.
  • Pinpointing the exact age of a particular branch on the Tree of Mankind is notoriously difficult
    • Crude estimates suggest that people on the M756 branch share a common ancestor about 1200-1300 AD
    • Similar age estimates for the branches immediately ancestral to M756 are both in the range of 950-1100 AD (FT44669 & Y5061)
  • Many members have roots in or around Tipperary which is in keeping with the history of the surname.
    • Genetically-related surnames on adjacent branches include O’Dwyer, Whelan, Kennedy, & Butler, all of which have high concentrations in or around Tipperary.
    • Group 2 most likely represents the descendants of the Mulryan clan which originated in Leinster but moved to Tipperary in the 1300s.
  • There are several group members that may in fact form a completely different genetic group (Group 2a) and these appear to share a common ancestor about 950-1100 AD. 
    • This is not consistent with the traditional genealogies for the Ryan’s of Idrone (they shared a common ancestor with the Mulryan’s about 1900 years ago). Descendants of this group (should they survive) remain to be identified.
    • Further Big Y testing is necessary to clearly identify any people who rightly belong in this new group (for now provisionally labelled Group 2a).

Maurice Gleeson
Oct 2020


Footnotes, Links & Resources

[1] The Methodology behind Age Estimations for Branching Points in the Haplotree


YFULL gives generally good age estimates for the more upstream branches of the Tree of Mankind. However, the accuracy of these estimates is subject to a lot more variability as one approaches the more downstream branches because the amount of data to work with gets less and less. So what should we use for the more downstream branches?

I use mainly two methods - SNP Counting (using YFULL's methodology) and Dave Vance's SAPP programme. Each has their pros and cons and it is useful to compare & contrast the outputs of each. SNP Counting is easier and is fine for arriving at a ball park mid-point estimate. SAPP is much more time-consuming but it should give a more “accurate” estimate, but SAPP can need a lot of hand-holding in order to “point it in the right direction”. In addition, for SAPP to work best, a lot of STR values need to be inferred or imputed if they are missing, and this could potentially allow inaccuracies to creep in. And the more data one imputes, the greater the risk. This worries me because it potentially allows observer bias to creep in and could very easily skew the results. I’ve also noticed that SAPP is very sensitive to the data input. The point being: SAPP is only as accurate as the data put in … and that may be subject to error and/or bias ...

The methodology for SNP Counting is based on YFULL’s methodology (which in turn is based on that of Adamov, which has been published), but unlike YFULL, I don't use a correction factor (I don't believe this would substantially alter the results anyway - it amounts to about a 5-7% increase in average SNP count).

The average number of years per SNP is 83, as discussed briefly on the following Y-DNA Warehouse webpage and the calculation incorporates a constant sourced from Poznik 2013 … https://ydna-warehouse.org/statistics.html 

Interestingly, the average number of “years per SNP’ will decrease if a particular portion of the Haplotree is heavily sampled (for example, by an avid, well-organised project administrator) - this decrease in the average is due to the fact that coverage for / of the relevant SNPs in this portion of the haplotree increases and this increases the number of callable SNPs. Dave Vance (inventor of the SAPP Programme) wrote an excellent description of this here.

Several researchers have reported an estimated average “years per SNP” value of 70 based on their own research. Dennis Wright did this calculation for his L226 project and you can find the link to his Big-Y spreadsheet with the relevant data on his website here … https://irishtype3dna.org/bigy.php … this reduced value for average number of years per SNP is in keeping with the comment above about “coverage”.

The value of 83 is 18.6% bigger than 70 … so whether it is 70 yps or 83 yps, it is still in the same ball park and still subject to a wide range of some … 300-600 years?

So the key message I want to get across is: TMRCA estimates are crude (whatever methodology is used) and can be misleading, especially when confirmation bias is likely to be in play.

The TMRCA for the M756 branch is calculated thus (moving from left to right across the Big Y Block Tree diagram):
  • 1st branch (2 people) ... 9 Private Variants (on average) + 1 shared SNP (PH3275) = 10 SNPs in total
  • 2nd branch (2 people) ... 2 PVs + 9 shared SNPs (BY63617 block) + 2 shared SNPs (BY80957 block) = 13 SNPs in total
  • 3rd branch (1 person) ... 5 PVs + 2 shared SNPs (BY80957 block) = 7 SNPs in total
  • 4th branch (2 people) ... 3 PVs + 2 shared SNPs (BY75646 block) = 5 SNPs in total
  • 5th branch (2 people) ... 1 PV + 3 shared SNPs (FT195447 block) + 9 shared SNPs (FT195737 block) = 13 SNPs in total
  • 6th branch (1 person) ... 3 PVs + 9 shared SNPs (FT195737 block) = 12 SNPs in total
  • Others on M756 (16 people) ... 6 PVs = 6 SNPs in total

If we were to add these up we would get what you see below ... but it is not as simple as that! 
  • (2x10) + (2x13) + (1x7) + (2x5) + (2x13) + (1x12) + (16x6) = 20+26+7+10+26+12+96 = 197 SNPs in total among 26 people ... 
  • This gives us an average of 197/26 = 7.58 SNPs back to the common ancestor
  • Allowing 83 years per SNP, this works out as 629 years (83 x 7.58)
  • And if we assume the average year of birth of the 26 participants is about 1950, then this equates to about (1950-630=) 1320 AD for the common ancestor's year of birth
  • Rounding to the nearest 50 years gives us 1300 AD as a very crude date for the common ancestor for Group 2

But like I say, we don't do that. Instead what we do is we calculate the average number of SNPs for each branch that feeds directly into the M756 branch (Branch 2 & 3 will be considered together because they join up prior to M756; and the same applies for Branch 5 & 6). Adding all of these up gives us: 
  • Branch 1 ... (2x10) / 2 = 10 SNPs on average for Branch 1
  • Branch 2 & 3 ... (2x13) + (1x7) / 3 = 11 SNPs on average for Branch 2/3
  • Branch 4 ... (2x5) /2 = 5 SNPs on average for Branch 4
  • Branch 5 & 6 ... (2x13) + (1x12) / 3 = 12.7 SNPs on average for Branch 5/6
  • Others ... (16x6) / 16 = 6 SNPs on average for this final "branch"
So adding up the contributions from each individual "branch" and dividing by the number of branches that directly feed into M756 gives us the following: 10+11+5+12.7+6 = 44.7 / 5 = 8.94 SNPs on average (back to the TMRCA, namely M756)
  • Allowing 83 years per SNP, this works out as 742 years (83 x 8.94)
  • And if we assume the average year of birth of the 26 participants is about 1950, then this equates to about (1950-742=) 1208 AD for the common ancestor's year of birth
  • Rounding to the nearest 50 years gives us 1200 AD as a very crude date for the common ancestor for Group 2

So there are several things to note:
  1. the TMRCA estimate using SNP counting (1200 AD) is 400 years later than the TMRCA estimate from YFULL (800 AD)
  2. it is also 200 years after the introduction of surnames in Ireland
  3. 83 years per SNP is the estimated average based on current data [1] but some researchers are finding that the average number of years per SNP is less than this (maybe 70 years per SNP). Using this alternative value gives us 70 x 8.94 = 626 years which equates to about (1950-626=) 1324 ... say 1300 AD

[2]  Crude Age Estimates for Branches Upstream of M756

Note: these estimates use 84 years per SNP rather than 83.

FT44669

  • Descendant branches
    • M756 ... ... 4 PVs + 6 public + 2 in SNP Block = 12 SNPs in total (in 26 people)
    • BY61861 ... 6 PV + 4 pub + 11 SB = 11 total (16 people) 
    • FT164983 ... 2 PV + 11 pub + 2 SB = 15 total (10 people)
    • Others ... ... 10 PV (2 people)
  • Summation ... (26x12) + (16x11) + (10x15) + (2x10) = 312 + 176 + 150 + 20 = 658 in 54 people
  • Average ... 658 / 54 = 12.19 SNPs back to the common ancestor
  • Years back ... 12.19 x 84 years per SNP = 1024 yrs ... 12.19 x 70 YpS = 853 yrs
  • Crude date ... 1950 - 1024 = 926 AD ... 1950 - 853 - 1097 AD
  • Rounded dates ... 950 to 1100 AD

Y5061

  • Descendant branches
    • BY19114 .. 4 5PVs + 7 public + 2 in SNP Block = 13 SNPs in total (in 14 people)
    • FT44669 ... 5 PV + 6 pub + 1 SB = 12 total (54 people) 
    • FGC22222 ... 1 PV + 11 SB = 12 total (2 people)
    • BY104904 ... 1 PV + 10 SB = 11 total (2 people)
    • Others ... ... 10 PV (3 people)
  • Summation ... (14x13) + (54x12) + (2x12) + (2x11) + (3x10) = 182 + 648 + 24 + 22 + 30 = 906 in 75 people
  • Average ... 906 / 75 = 12.08 SNPs back to the common ancestor
  • Years back ... 12.08 x 84 years per SNP = 1015 yrs ... 12.08 x 70 YpS = 846 yrs
  • Crude date ... 1950 - 1015 = 935 AD ... 1950 - 846 - 1104 AD
  • Rounded dates ... 950 to 1100 AD

Y5058

  • Descendant branches
    • Y5061 ... 5 PVs + 7 public + 5 in SNP Block = 17 SNPs in total (in 75 people)
    • BY72962 ... 11 PV + 12 SB = 23 total (2 people) 
    • Others ... ... ?? PV (16 people) ... number of Private Variants is not reported
  • Summation ... (75x17) + (2x23) + (????) = 1275 + 46 + ?? = 1321 in 77 people
  • Average ... 1321 / 77 = 17.16 SNPs back to the common ancestor
  • Years back ... 17.16 x 84 years per SNP = 1441 yrs ... 17.16 x 70 YpS = 1201 yrs
  • Crude date ... 1950 - 1441 = 509 AD ... 1950 - 1201 = 749 AD
  • Rounded dates ... 500 to 750 AD


[3] Which came first: the Ryan chicken or the non-Ryan egg?

There are several non-Ryan matches that arose close to the time that the Ryan surname arose - but did any of them precede the Ryan's? or are they pre-surname matches (i.e. >1000 years old)? or are they surname switches who previously were Ryan's (<1000 years old)? Let's look at what evidence there is for these various options.

1) None of the above names are reported to be related to the Ryan surname (in the Ryan pedigree detailed by O Hart). This shifts the balance of probabilities towards them being the result of historical surname switches or pre-surname matches.
 
2) If these matches are singletons or belong to a very small genetic group (rather than a large group of same surname matches), then it is more likely that these are surname switches (which tend to result in smaller genetic groups). If however, they do belong to a much larger group of people with this surname (who would match Group 2 if they were actually within the Ryan project) then we may be looking at an ancient NPE whose descendants thrived and survived the plagues of the millennia to form a significant presence in the world today ... or alternatively the Group 2 Ryans are in fact Kennedy's ancestrally ... or some other explanation. So let's see what evidence exists for each surname:
    • The Carroll connection - only 1 of the 587 members of the Carroll DNA Project is FT79210+. However, several members test positive for the upstream SNP Y5058 (TMRCA 500-750 AD, see [2] in Footnotes). This may be associated with the O'Carroll's of Ossory (according to the website). However the group to which they are allocated is quite a disparate group with many different surnames within it (only 9 of the 42 are Carroll's) so it is difficult to sort things out from this hodge-podge of genetic matches, many of which may be chance matches due to Convergence. 
    • The Foley connection - the Foley DNA Project does not have a public Results page so I cannot check for the specific SNPs in the Foley SNP Sequence. However, according to Woulfe, Foley was a Waterford name (which is close to the putative origin of the Leinster Mulryan's & Ryan's), so maybe this is a surname switch?
    • The Parsons connection - this person's MDKA was an O'Dwyer. He is the only member of the 170-strong Dwyer DNA Project to belong to the Y5058 line of descent so it seems more likely that this represents a DNA switch of some sort.
    • The Kennedy connection - - the Kennedy DNA Project also does not have a public Results page so I cannot check for the specific SNPs on their website. 






Extensive Lineages of Group 2 – results of Big-Y tests

A lot has been going on behind the scenes with the Ryan DNA Project. Back in May 2021 we made an appeal for funding to test Ryans with exten...