Tuesday, March 26, 2019

Mining the Meta Data in your iTunes Library

If you (or your students) use iTunes to keep track of your music then it turns out they have a rich source of data that might be interesting for your students to analyze. I find that if students use their own data they are more interested in looking at that data for analysis. In this case, every song on iTunes (and really, any platform) has a pile of meta data associated with it. In that meta data are things like song name, artist name, album name but also there are numerical values like song length, file size, number of plays etc. So you could have your students get the data from their own library and do the analysis of it.

Getting the data from iTunes is pretty easy. Once in iTunes, if they want to get the info from all their music then just click on Songs or if they want to get their data from a favourite playlist then they can click on that. Then click on File, then Library, then Export Playlist. It will then send a .TXT file to the folder of your choice. That .txt file will need a bit of cleaning up, but not much. I suggest importing it into Excel or Google Sheets to clean it up. If you are doing the work in that spreadsheet (or uploading to Desmos) then you're all set. If you plan on importing it into CODAP then save the data as a .CSV file (note that I noticed that even though you should be able to import a .TXT file into CODAP, the format of this one doesn't seem to work, so you have to convert it to a . CSV).

Analysis

Though the data itself is not wildly interesting, you can certainly use it to cover topics like mean, median, standard deviation, and other single variable measures. And maybe have students compare values from their playlists to other students. Note, that the time of the songs are in seconds. So if a histogram is created, it is probably appropriate to have bin widths of 30s or 60s (let students figure this out).

One thing that I think is interesting is that you would expect a very strong (if not perfect) relationship between the time of a song and it's file size. But as you can see there seems to be different relationships. This is due to the bit rate of the file compression. So you might be able to have a conversation about what bit rate is and how it relates to the compression of the file. The lower the bit rate the smaller the file size (for songs of the same length). So you could talk about why you would want a lower or higher bit rate (hint: lower bit rate means poorer quality of the sound but smaller file size, so there is a trade off). In CODAP you can create separate graphs of the bit rate data and the scatter plot of the size vs time then high lite parts of the data to show the different relationships. You could actually hide or show data based on the bit rate to do more specific analysis by isolating just the data from one bit rate.

Sample Questions

  • Choose three numerical attributes from your data and determine the mean, median and SD of each. Graph each attribute using an appropriate representation.
  • Which genre of music has the highest average song length?
  • Which song was played the most?
  • Which decade has the most songs?
  • Which song was skipped the most?
  • Determine the relationship between the size of a file and how long the song is for different bit rates. 
  • You have only 50 Mb of space left on your device. How many minutes of music could you store using all of the remaining space (note that answers will vary based on the bit rate.

Downloads

Let me know if you used this data set or if you have suggestions of what to do with it beyond this.

Friday, March 22, 2019

Hip Hop Vocabulary

This post originally came out in 2014 (before this blog was created) and so I hadn't thought about it for a while. Then I saw a post by Dane Ehlert on his When Math Happens blog and was not only reminded of it but noticed that the original post had been updated in look and with new data. Basically they take a pile of hip hop artists and count how many unique words they use in their first 35000 lyrics.

Analysis

When you go to the site, the visualization (above) is interactive in that you can search for artists and interact with the visualization. This is neat but on this blog we typically want to do some mathematical analysis. They have other representations like this one that looks like a histogram but for our purposes, we would like some numbers.

 
So if you look way down on the post, they do have a Google Sheet with the number of unique words for each of the over 160 artists. It's not a particularly robust data set but we can do some simple
analysis, like histogram, averages, box plots and other single variable analysis. I don't think there is anything particularly mathematically interesting with the data but this is data that might be interesting for students and so it could be used to do practice some standard single variable analysis techniques (central tendance, standard deviation, distributions, dot plots, box plots, histograms etc)

Sample Questions

  • Who are the outliers in this data set?
  • Which decade has the most verbose rappers?
  • How does your favourite rapper compare to the most/least verbose rapper?
  • Take a look at some of the questions Dane was asking in his post for some more open questions.
  • What does the data in the original post say about the amount of words used in different types of music?

Downloads 


Let me know if you used this data set or if you have suggestions of what to do with it beyond this.