Friday, December 4, 2015

Smoking and Cancer

For many years I used to use the Data & Story Library (DASL) but for some reason the data on the site is unavailable currently but there are some great data sets there. Since they are unavailable I thought I would share some of my favs.

The Analysis

Probably my most favourite is the Smoking and Cancer story. This is a great data set for talking about correlation. The data is the gives the average number of cigarettes smoked in each US state and then the rates of bladder cancer, lung cancer, kidney cancer and leukaemia for each state. So at the very least you can have students create the graphs of each of the afflictions vs the number of cigarettes smoked. When you do you get the following graphs:

The thing I like the most about this is that when you do that you see that bladder cancer has the strongest correlation which is not intuitive. But in the above graph you will notice that the scales are all different. The graph below shows the same graphs but all with the same scale. Here you see that even though bladder cancer may have a similar correlation as smoking, there really isn't much of a relationship (ie no matter how many cigarettes smoked the rate of bladder cancer barely changes). And since the other two have low or no correlation, you can see that smoking has the largest connection to lung cancer.

So it's a good lesson about correlation and why it is important to scale the axes similarly when comparing data.

Sample Questions

  • Which pairs of data appear to have a connection to each other?
  • What do each of the numbers represent in each equation?
  • Which of the scatter plots indicate that there is a relationship between the data?
  • Use your least squares equations to predict what the death rate would be for each relationship if the Cig value was 10 or 50. How confident can you be of each prediction?

Download the Data

Fathom (Data) (Solution)
Google Spreadsheet
CODAP file

Let me know if you used this data set or if you have suggestions of what to do with it beyond this.

4 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. Interesting topic for a blog. I have been searching the Internet for fun and came upon your website. Fabulous post. Thanks a ton for sharing your knowledge! It is great to see that some people still put in an effort into managing their websites. I'll be sure to check back again real soon. Heets Dubai

    ReplyDelete
  3. What does hds mean. Doubt I'm gonna get an answer in time but the google spread sheet says hds per capita, and I can't find anything online about its meaning.

    ReplyDelete