Guest Post - by Michael Lieff (@virgonomic)
Every year for the last 15 years, my neighbour, who is a die hard Fighting Irish fan, has planned a driving trip to Notre Dame University near South Bend, Indiana. I attended for the first time in 2017 and again in 2018. After a travel day, the first stop on the campus tour is the bookstore. In the lobby, they have a table with one style of short- and long-sleeve t-shirts. In 2017 "the shirt" was navy and it didn't really grab me.
However, in 2018 the shirt was kelly green which drew me in, as green is my favourite colour. I read the price tag and learned that "the shirt" is a student initiative and the proceeds go back into student activities and assistance. At $18 USD it was a no-brainer.
Once I had my shirt, I visited the URL on the price tag. There is a link to a timeline that shows the shirt design from every year, and more importantly, the number of shirts sold, the team's record and the shirt manufacturer. Found data! Even more interesting is that there is no data for number sold for the years 1994-1996.
AnalysisThe first question that came to my mind is: how many shirts did they sell from 1994-1996? Due to this gap, the dataset is a really nice example to explore interpolation and extrapolation. I figured the trend would be linear and the line of best fit would give a pretty logical prediction. Upon visualization, it definitely isn't cut-and-dried.
There are some interesting things going on here.The number of shirts sold dropped fairly significantly from 1993 to 1997. It also skyrocketed in 2002 and then plummeted in 2004. Possible reasons for this would make for an interesting discussion.
Drilling a bit deeper, the next question that came to mind is: Are more shirts sold in seasons where the team is winning?
It doesn't appear so, but I will let you 'do the math'.
In terms of analysis, the following questions could be asked:
- Is the trend linear or is a curve a better model?
- Can you interpolate the number of shirts sold in 1994-1996 where there is missing data? Extrapolate the number sold in 2018 or beyond?
- What are the mean, median and mode number sold?
- Do the number of shirts sold correlate with the team’s wins that season?