Friday, January 4, 2019

Highest Grossing Concert Tours

Concerts are a multi billion dollar industry now. So why not use some concert data to do some statistical analysis. This data comes from the wikipedia page on the same subject. On the page the data is broken up into the top 20 all time highest grossing concerts (ordered by unadjusted by inflation numbers). Then it has the top grossing tours for each decade from the 80s until the present. There is data on the decade rank, gross and inflation adjusted gross, the number of shows attendance and other attributes.


You can start with some categorical analysis by just looking at the who made the list each year. This data runs for four decades so kids might not be into who was big in the 80s but if you highlight the biggest acts of the last decade you can still see that more than half of them were artists that were around in the 80s (with U2 being #1) and U2, Guns n Roses and The Rolling Stones (twice) were in the top 5 of all time (inflation adjusted).

For more numerical analysis you could pick any of the data sets to do some single variable analysis. Whether it be central tendency, distributions, or histograms. There are many choices.

When you create some box plots you will find that some of the data sets have outliers. In particular, I think it's interesting that the outliers when dealing with the money are different from the outliers when dealing with the number of shows. This might lead you to explore things like the the Average Gross and compare it to the money and number of shows.

This might lead you to do some double variable analysis. Though there aren't any strong relationships, you could use this to maybe talk about relationships with poor correlations. Technically there is one strong relationship. That's the one between the Gross and the Inflation adjusted gross. This would be expected as one relates directly to the other. One thing that I like about this, however, is that it's not a perfect relationship. That is, who ever adjusted for inflation did so using different rates for each year (to make it more realistic, presumably).

Sample Questions

  • Which Artist made the most (over all/ or per concert)?
  • Which decade made the most money (adjusted for inflation)?
  • Which artists are outliers the most often?
  • Calculate the mean and median for each of the numeric attributes. How do these values suggest something about the distributions?


Let me know if you used this data set or if you have suggestions of what to do with it beyond this.