Saturday, November 10, 2018

Notre Dame University - "The Shirt"

Guest Post - by Michael Lieff (@virgonomic)


Every year for the last 15 years, my neighbour, who is a die hard Fighting Irish fan, has planned a driving trip to Notre Dame University near South Bend, Indiana. I attended for the first time in 2017 and again in 2018. After a travel day, the first stop on the campus tour is the bookstore. In the lobby, they have a table with one style of short- and long-sleeve t-shirts. In 2017 "the shirt" was navy and it didn't really grab me.


However, in 2018 the shirt was kelly green which drew me in, as green is my favourite colour. I read the price tag and learned that "the shirt" is a student initiative and the proceeds go back into student activities and assistance. At $18 USD it was a no-brainer.

Once I had my shirt, I visited the URL on the price tag. There is a link to a timeline that shows the shirt design from every year, and more importantly, the number of shirts sold, the team's record and the shirt manufacturer. Found data! Even more interesting is that there is no data for number sold for the years 1994-1996.

Analysis

The first question that came to my mind is: how many shirts did they sell from 1994-1996? Due to this gap, the dataset is a really nice example to explore interpolation and extrapolation. I figured the trend would be linear and the line of best fit would give a pretty logical prediction. Upon visualization, it definitely isn't cut-and-dried.

There are some interesting things going on here.The number of shirts sold dropped fairly significantly from 1993 to 1997. It also skyrocketed in 2002 and then plummeted in 2004. Possible reasons for this would make for an interesting discussion.

Drilling a bit deeper, the next question that came to mind is: Are more shirts sold in seasons where the team is winning?

It doesn't appear so, but I will let you 'do the math'.

Sample Questions

In terms of analysis, the following questions could be asked:
  • Is the trend linear or is a curve a better model?
  • Can you interpolate the number of shirts sold in 1994-1996 where there is missing data? Extrapolate the number sold in 2018 or beyond?
  • What are the mean, median and mode number sold?
  • Do the number of shirts sold correlate with the team’s wins that season?

Download the Data

 Let us know if you use this dataset or have any suggestions for things to do with it beyond this.

Monday, November 5, 2018

2018 NFL Salaries

We have a local NFL player that went to high school in one of the schools I support. Luke Willson was recently on the Seattle Seahawks and currently is on our local Detroit Lions. In conversation, a coworker wondered how much his salary was. The Internet provides. Not only his salary, but the salary of every one of the almost 1800 players (who knew there were so many?).

And when you have such a large data set, I think that you should analyze it. It's not a particularly deep topic. But it's a good data set to talk about mean, median, skewing and outliers. Not anything super interesting from a data perspective but the context may be interesting enough to capture the interest of some of your students to do basic single variable analysis. The data includes info about a player's name, salary, position, team, overall rank and I added the team rank. There are 32 teams and a bit over 50 players per team.

Analysis


Certainly some things you can do are to create some graphs. The first types that comes to mind is a dot plot, box plot and histogram. In this case the dot and box plot are provided by CODAP while the histogram comes from Google Sheets. You can see from the dot plot that the mean and median are quite separated (which we would expect from the skewing) and that there are a large number of outliers.

Since we were talking about Luke Willson, we could certainly ask how his salary compares to other NFL players (he's 455th) or other players on his team (he's 18th of 56) or even how he compares to other people the same position (21st of about 126 tight ends and is above the mean tight end salary)

Sample Questions

  • Determine the mean, median and standard deviation for the salaries attribute.
  • Which team has the highest mean salary? median salary?
  • Choose a player of your choice, how do they compare to the league, team and position?
  • Besides the way it looks, what confirms that this data is skewed to the right?
  • Which team has the highest number of outliers?

Download the Data

Let me know if you used this data set or if you have suggestions of what to do with it beyond this.