Patriots and Red Sox – is Hyland the curse? No, there isn’t any causation….
By Mike Hyland
Executive Director, NEPPA
One aspect of data analysis that irks me is the misunderstanding of causation versus correlation. The two terms are often mixed up in articles and reports, and thus the writer (and reader) can jump to incorrect conclusions about the relation between two or more variables. Maybe it’s the engineer in me, but everyone should be versed in the difference.
For instance – let’s take my living in New England vs the success of the Patriots and Red Sox. To make my analysis simple, let’s analyze the years from 1985 thru 2023. I lived in New England from 1989 thru 1996, and again since 2022. If you charted the records of both franchises, you would see a direct correlation between the two variables. Variable 1: Hyland in New England. Variable 2: Success of the Patriots/Red Sox.
In the years when Hyland lived in New Hampshire, the Pats won a paltry 32% of their games, and the Red Sox have won 49% of their games. However, if we look at the years Hyland lived in Maryland, the Patriots won 70% of their games, and the Red Sox won 55% of their games. During those same years when Hyland wasn’t in New England, the Patriots won 6 super bowls, went to the super bowl 10 times, and went to the playoffs 20 times. The Sox won 4 World Series and went to the playoffs 12 times during this same period of Hyland absence. One must look at the data, stop and think: ‘what was the search committee thinking when they hired Hyland at NEPPA?’ Since I’ve returned – the Red Sox and Patriots are both in the cellar.
But am I to blame? Heck no! That’s the difference between Correlation and Causation. Although you can correlate the success of the teams while I was in the Mid-Atlantic, you can’t find that I was the cause. The causes could be a lot of variables, such as ownership, coaching, draft choices, trades, etc. I know there might be some of you out there that would like to blame me for the woes while I’m in New England, but the truth is correlation doesn’t mean too much in statistical analysis when looking for cause and effect.
If you have some time to spare, just google ‘funniest correlation graphs’ and let the amusement begin. Some examples: ‘The number of people who drowned by falling into a pool’ correlates quite well with ‘yearly films Nicholas Cage appeared in’. Or, ‘the Global Average Temperature’ from years 1820 to 2000 correlates well with the ‘Number of Known Pirates.’ How about: ‘The US Highway Fatality Rate’ correlates quite well with the ‘Total tons of Lemons imported from Mexico to the United States’ during the years 1996 to 2000.’
Just sit back and think about these examples. If you mix up Causation with Correlation, you’d come to the conclusion that if Nicholas Cage stopped acting no one would fall into pools and die, buying less Lemons would decrease Highway fatalities, and the lack of Pirates is causing the global rise in temperature…… We know that none of these causations are true since it’s obvious and possibly absurd to any rational thinker. But what about the not so obvious.
When we explore the difficult topics such as climate change, rising water temperatures, autism birth rates, reliability statistics, vaccination benefits, electrification, accident investigations, etc. we need to pause and ask if the graphs indicate a causation or simply model correlation. Some items go together and correlate well, but might not be causal. For instance, the increase in I-Phone sales correlates well with the number of people dying from falling downstairs. But did buying the I-phone actually cause one’s death? Obviously not directly. The death of falling down the stairs might be caused by a variety of reasons such as the distraction of having earphones in one’s ear to increases in an elderly population. It also could be caused by staring at one’s phone as you begin to descend the stairway.
Regardless of the data presented, take note, and not jump to any causation conclusion before you review the facts behind the data. Many articles, reports, etc. are one sided and want the reader to blame X on Y. Most likely, Y might be one of many factors, and instead X is actually caused by Y, Z, A, B, C and possibly D variables.
Otherwise, you might go on believing that eating less margarine causes less divorces in the State of Maine. Why wouldn’t you – they correlate.
-Mike
|