Sunday, May 16, 2021

Star Wars Data via Kaggle

Another repository of freely available data is called Kaggle.  "Inside Kaggle you’ll find all the code & data you need to do your data science work. Use over 50,000 public datasets and 400,000 public notebooks to conquer any analysis in no time." I like this repository because it seems to be easily searchable and there are a lot of data sets so you should be able to find one that is on an interesting topic for your students without too much trouble. 

And to show case a data set, I'm choosing one suggested to me by @virgonomic on data from the Star Wars franchise. And actually it's several data sets. 

Analysis 

There are four CSV files, one on characters, species, planets, starships and vehicles. Now you are not going to be doing any ground breaking statistical work here as the context of these data sets are pretty niche to die hard Star Wars fans. Like, I'm not sure who will care that the Bantha-II cargo skiff has a one day supply of consumables. None the less these are good data sets to be used for basic stats (finding mean, standard deviation, correlation etc). You can definitely find many attributes that are categorical as well. One thing I did noticed is that with most of the sets there was always one or two things that could be used to talk about outliers. Like Jabba the Hutt in the Character's dataset or the rotational period of planets in the planet data set


Sample Questions

  • When you consider the length of a vehicle compared to the number of crew it holds, are there any outliers?
  • What is the standard deviation of the _______ attribute in the _______ data set?
  • Find your favourite character. Pick and attribute and describe how your character compares to the others. 

BONUS data: Though this is not from this data set, it was recently Star Wars day and someone posted this infographic comparing the number of lines each character spoke and what words they spoke the most in the original trilogy. 


Downloads

Let me know if you used this data set or if you have suggestions of what to do with it beyond this.

No comments:

Post a Comment