Monday, May 17, 2021

Introductory Statistics Data Cards

I love this set of data cards created by @DavidButlerUoA (be sure to check out the comments on the post for more info from him):

These are ideal for when you are just starting out talking about stats. Each card is a data point with ten attributes (name, age, height, heart rate, temp, mood, arms, headgear, pet, bike). To me, you give these cards out to students with the instruction to sort them in any way they see fit and then see what happens. I wouldn't even tell them which attributes you have and just let them come to their own discoveries. This is a really great way for students to ease into the idea of analyzing statistics in a painless and approachable way. You can see some of the results that @DavidButlerUoA got here, here and here

Analysis

Once you have informally had students interact with these cards, you can continue to refer to them as you talk about the difference between categorical and numeric data, do some single variable stats measurements, two variable correlation and more. All the while you can keep referring to the cards in a more human context as each of them represents one "person" (though the data is made up, some of the relationships were taken from health studies). So although you will not solve any statistical mysteries with this data set, it is quite rich and divers and can be used to demonstrate many different statistical concepts. 

Sample Questions

  • Sort these cards into any arrangement you wish. What patterns do you see? Be sure to justify your arrangement(s).
  • What is the probability that if a person is happy, they are dancing?
  • Could riding a bike make you healthier?

Downloads

Original Cards as PDF (ideally printed on card stock, cut, and laminated)
Data (CSV, Google Docs, CODAP)

Be sure to check out David's other math related teaching materials on his Making Your Own Sense blog 

Let me know if you used this data set or if you have suggestions of what to do with it beyond this.

 

Sunday, May 16, 2021

Star Wars Data via Kaggle

Another repository of freely available data is called Kaggle.  "Inside Kaggle you’ll find all the code & data you need to do your data science work. Use over 50,000 public datasets and 400,000 public notebooks to conquer any analysis in no time." I like this repository because it seems to be easily searchable and there are a lot of data sets so you should be able to find one that is on an interesting topic for your students without too much trouble. 

And to show case a data set, I'm choosing one suggested to me by @virgonomic on data from the Star Wars franchise. And actually it's several data sets. 

Analysis 

There are four CSV files, one on characters, species, planets, starships and vehicles. Now you are not going to be doing any ground breaking statistical work here as the context of these data sets are pretty niche to die hard Star Wars fans. Like, I'm not sure who will care that the Bantha-II cargo skiff has a one day supply of consumables. None the less these are good data sets to be used for basic stats (finding mean, standard deviation, correlation etc). You can definitely find many attributes that are categorical as well. One thing I did noticed is that with most of the sets there was always one or two things that could be used to talk about outliers. Like Jabba the Hutt in the Character's dataset or the rotational period of planets in the planet data set


Sample Questions

  • When you consider the length of a vehicle compared to the number of crew it holds, are there any outliers?
  • What is the standard deviation of the _______ attribute in the _______ data set?
  • Find your favourite character. Pick and attribute and describe how your character compares to the others. 

BONUS data: Though this is not from this data set, it was recently Star Wars day and someone posted this infographic comparing the number of lines each character spoke and what words they spoke the most in the original trilogy. 


Downloads

Let me know if you used this data set or if you have suggestions of what to do with it beyond this.

Saturday, May 15, 2021

The Big Bang Theory Ratings & Viewership via Data.World

Data.World is a great site for data sets and they all seem to be freely downloadable once you create an account.  The site is a paid site but seems to be paid for people who use data in commerce. Members upload all kinds of data sets and you can search through them. 

To show that I've taken a sample data set about the Big Bang Theory TV show. It was a great show and  it doesn't matter whether you didn't watch it when it first aired because you can probably find an episode of the Big Bang Theory on TV at just about any time of the day. So if you are looking for some data then two data bases (Wikipedia and IMDB) were scraped to get information like ratings, viewership, plot line and more and housed at data.world

Analysis

There are several attributes to this data set (including episode descriptions and titles) but you probably want to stick to the numerical ones. You can do single variable analysis of the number of viewers, the votes and the ratings and some double variable analysis. I like the single variable analysis because you can separate the seasons and do a separate analysis for each season. 

Sample Questions

Which season had the highest average viewership?
Is there a connection between the rating and number of votes?
Which season(s) had the most popular episodes? 

Downloads 


Let me know if you used this data set or if you have suggestions of what to do with it beyond this.