Monday, June 6, 2016

Electric Car Rebates

So this article came across my Facebook feed a while back and I though it was a great potential source of data for discussion at many levels
It certainly captured my attention as an Ontario resident but a closer look showed that there was potentially a lot of data to be analyzed. The data is about the Ontario Electric Vehicle Incentive program and the above article was inspired by this news release but in the article they were able to get more specific data about number of vehicles of each style (which is not released).

Analysis

Students are encouraged to look critically at the original article and perhaps talk about how the title and some of the information given is used to incite a reaction.
For example even though they gave the overall numbers of almost 4800 people getting around $39 million in rebates, they focused on just the rebates of the most expensive cars which total about 2% of the people and rebate value. And although they do mention it, it's not highlighted but about 25% of those rebates went to one vehicle, the Chevrolet Volt.
But looking at the ministry website you can see a nice data set about which cars get which rebates (as well as info about how the program changed once it was pointed out that super expensive luxury cars were getting rebates.
I was able to get this table out and clean it up as well as add the approximate value of each car to the list (it's approximate because I had to go and search each out on the web so I might have been a bit lazy when it came to options) and now it is good for some simple analysis.
On the "low hanging fruit" end you can create the bar graph of the number of models for each company. Personally, I wouldn't have guessed GM to be at the top. But you can also create a histogram of the actual rebate to look at the distribution (or perhaps look at the box plot or dot plot). Lastly you could look at whether there is a connection with the price of the car and how big the rebate is.

Sample Questions

  • Which manufacturer has the most electric models?
  • What is the most common rebate value?
  • Does the rebate get bigger (in general) as the price of the car increases?
  • If you were going to purchase an electric vehicle, which one would benefit the most/least from the rebate program?

Download the Data

Let me know if you used this data set or if you have suggestions of what to do with it beyond this.

Tuesday, May 24, 2016

Gas Prices in Ontario

A friend, Michael Lieff pointed this nice set of data out. It is the price of gas in several Ontario cities going as far back as 1990. This is an interesting data set as the price of gas, in general, increases but you can see that that wasn't always the case (only a few of the cities are shown below).

Analysis

When you go to this website you have several options for prices and you can download a year of data at a time (with a CSV as an option). The obvious choice is regular gasoline but you might want to consider things like comparing regular gas to alternative fuels like propane. For example in this case, you can see that, in general, propane also has risen in price over time but where gasoline seems to fluctuate similarly regardless of the city, propane seems to be more volatile depending on location.

Because of the shear amount of data points possible (you can get a weekly average for the last 25 years for several cities if you want), you may wish to stick to yearly values. Another option is to use some of he weekly values to talk about the dangers of extrapolation



Download the Data

Site http://www.energy.gov.on.ca/en/fuel-prices/
I have also taken the liberty of downloading all of the data for gasoline (all 25 years of it) in weekly, monthly and yearly form. As well as the yearly propane data. You can get it on this Google sheet (note the tabs) or just the gas prices on Fathom

Let me know if you used this data set or if you have suggestions of what to do with it beyond this.

Friday, May 13, 2016

The Data and Story Library - DASL

DASL (pronounced "dazzle") is the Data and Story Library is an awesome database of sets of data that are specifically to help teach topics of statistics. They are all real sets and are all categorized by topic/subtject (eg automotive, food, health, sports etc) and mathematical method (eg boxplots, mean, outliers, regression, scatterplots etc). So theoretically if you wanted to find a set of data that could be used to help teach a specific topic you could search for, say, "correlation"
These are some great data sets to get through the mechanical nature of statistics. It's not very current data but it's great for practicing statistical methods.
For the longest time this set of data was not available but just recently it was hosted by Data Description Inc. so now we have access to it again.

Analysis

There are far too many sets to talk about analysis but when the site was down I blogged about one of my favourite sets on Smoking and Cancer. Take a look at that post to get a sense of the data. When you get to any data set, to see the actual data file, click on the Datafile Name

This will show you the text file of the data with the download link at the top of the page.
From that point you can do the analysis. Each data set will have a detailed description of each variable and a short story and sample analysis of each set
There are many data sets on this site for every statistical topic and on a range of subjects. One thing you might have your students do is just explore on this site and find data sets that can be used to exemplify a particular statistical concept.

Download the Data


Let me know if you used this data set or if you have suggestions of what to do with it beyond this.

Saturday, March 5, 2016

Speed Data

A few weeks ago I saw this Tweet
I used to have some data kicking around my computer but I did a quick Google search and found that Car & Driver was a huge source of this type of data. And I love that you can get some of the data with their original hand written data sheets. BTW, here is @MJFenton's finished activity
And the teacher version.

The Analysis

Let's start with the data set from the above post. You can certainly do Desmos Need for Speed activity. The analysis in terms of determining a function is a little intense (IE not a standard function model). You can see some of the more exact analysis via the two links in the tweet below.
But if you didn't want to go too deep you could just use it to talk about non linear relationships or you could use it to talk about rates of change as speed data comes up a lot in calculus.
I have also found more data sets from different cars and you can see how they compare to each other on this Desmos file.

Download the Data

There actually is a lot of data that can be found on the Car & Driver site. Many of the cars in this link have data sheets (you really have to search around on each page to find the data sheet). But I have downloaded a few of them (seen in the Desmos file above) and created a Google Sheet for each so you can copy and paste the data where ever you want.
Porsche Spyder Data Sheet Google Sheet
Dodge Challenger Data Sheet Google Sheet
Chevy Camaro Data Sheet Google Sheet
Cadalac CTS Data Sheet Google Sheet
Chevy Malibu Data Sheet Google Sheet
Honda Fit Data Sheet Google Sheet
All Google Sheets
All data in CODAP file (with graph)

Let me know if you used this data set or if you have suggestions of what to do with it beyond this.

Tuesday, January 26, 2016

Magazines

A while back I started doing this activity with my students on the first day. For homework I would tell them to go home and find two magazines, get their prices the number of pages and count the number of pages with ads on them. Once they brought that in then we would combine all the data into one set. I got the idea from browsing through an Oprah magazine and being shocked at how many pages I had to turn in order to get to a page that had actual content on it. Eventually I automated the process by using a Google Form to collect the data. And by adding another criteria (the type of magazine), this actually turns into a pretty rich data set.

The Analysis

Certainly with this data set you can do any number of things pertaining to calculations (average, standard deviation, correlation etc) but I liked to use it to start to have a need to move from single variable analysis to two variable analysis. For example, the magazine in the current set with the highest number of ad pages is In Style with 380 add pages (which is definitely an outlier)
This seems outrageous and the hope is that this will intrigue the students into asking questions. And perhaps they will also realize that it's the magazine with the largest number of total pages. And that then presents a need to do a different type of analysis (two variable scatter plot). And when you do that analysis you will see that although 380 pages is proportionally a little high for a magazine with 620 total pages and is not so outrageous.
This is a good data set to just look at the basic stuff (creating bar graphs, histograms, box plots, scatterplots, measuring central tendency, determining correlations, finding least squared lines etc)
Other things you can do is look at the break up popularity of magazine (in your class or with this data set) by type of magazine. By breaking it up into types of magazine, you can have an opportunity for students to compare graphs . When students compare graphs, an important skill to have them demonstrate is to make sure the size and scales of the graph are similar. This data set can help facilitate that.

Sample Questions

  • Create histograms of each of the numerical attributes and plot the mean and median on each graph. Describe each histogram as skewed right, left or symmetrical and justify your answers
  • Compare the graphs of total pages to ad pages
  • What proportion of magazines would be Sports & Entertainment in the average household?
  • What type of distribution would the number of ad pages be described as? Justify your answer.
  • Are there any outliers in the number of ad pages? Do the outliers change if you consider the type of magazine instead of the whole group?
  • Is the number of total pages (or ad pages) in the magazine correlated with the price of the magazine?
  • If a magazine were to have 120 pages, how many of them would you expect to have ads? Is this number different if you consider the type of magazine instead of all the magazines in the group?

Download the Data


Let me know if you used this data set or if you have suggestions of what to do with it beyond this.

Saturday, January 23, 2016

Trending Data

I have known about all of these trending search engines and thought they were quaint but recently I have actually seen some examples of uses that make me believe they maybe worth more and worth talking about in an senior Data Management class. For example I saw this one from @NateSilver538
Another example is from the Science Friday Podcast talking about tracking "hate" through Google searches. Listen below:
The trending site used in both of those cases was Google Trends and has been around for a while. Basically you put in the search terms you wish to compare and it shows how often they were searched on Google. For example the Superbowl is coming up in a couple of weeks so if you search "Superbowl", it shouldn't be surprising that we get a periodic pattern:


Once you have one search term, you can add others. For example, let's see how popular Christmas is compared to the Superbowl:

Another place to look for trending terms is Twitter. And the site Hashtags.org gives analytics. Here you enter a hashtag and get the last 24 hours of Twitter traffic for that hashtag (at least in the free version). You can't do a comparison of hashtags but you can search any hashtag you wish. However you could highlight

Another place you can get trend data is Quantcast.com. This site does analytics on website traffic in general
 
You can get detailed analytics for free from any of the sites that are listed as directly measured.

The Analysis

Though with most of the trending sites, there is not much analysis to be done, we often hear about topics "trending" so these sites can be used to bring something concrete to class. But some simple analysis can be done with the Quantcast site by just importing the table of sites and you can do work on histograms and even bar graphs.

Sample Questions 

  • Find a trending topic on Twitter or Google. Verify the data using one of the trending analytic sites. Compare to a similar topic.
  • How does the traffic of the top 10 most popular sites compare to the next 10?
  • Are there any outliers in the set of most popular sites?

Download the Data

Website: https://www.google.ca/trends/
Website: https://www.hashtags.org/
Website: https://www.quantcast.com/top-sites
Quantcast data (Sheets, Sheets with graphs, Fathom, Fathom with Graphs, CODAP)

Let me know if you used this data set or if you have suggestions of what to do with it beyond this.

Friday, January 15, 2016

Where are the Rey Star Wars Toys?

This comes from a post from Five Thirty Eight looking at the distribution of new toys from the new Star Wars film. This is just a simple data set that could be made into a bar graph where students might be interested in the data. And it seems like maybe the scarcity of Rey toys was not accidental.

The Analysis


There is not much analysis for students to do here. They can create the bar graph and then answer some questions about it. The point here is that the data set itself is what is interesting for students. Students could also make a pie graph from the data since it represents 100% of the data. One of the good things this data set can do is help show why pie graphs aren't that good for analysis since the data is so close to each other (if just looking at the pie slices it is hard to tell which is bigger - without the percents showing). Most statisticians agree that, for the most part, pie graphs are not very informative. Yet we see them all the time. For example, look at the two representations to the right. The bar graph and pie graph show the same information but the pie graph is only useful for specific analysis if the percentages are actually shown. Otherwise it would be hard to determine the relative sizes of the pieces of pie and thus the relative weights of each type of toy. The problem becomes even worse when you use a 3D pie graph (so often used on news shows) and without the percents you cannot tell the difference in size between many of the pies. Of course the pie graph looks nicer, though.

Sample Questions

  • By what percentage do the number of Kylo Ren toys surpass BB-8?
  • Which type of graph would be better for this data, bar or circle? Justify your choice.

Download the Data

Google Sheets (with graphs)
The original post
http://fivethirtyeight.com/features/wheresrey-the-star-wars-heroine-is-featured-in-fewer-toys-than-all-the-new-dudes/