Tuesday, March 26, 2019

Mining the Meta Data in your iTunes Library

If you (or your students) use iTunes to keep track of your music then it turns out they have a rich source of data that might be interesting for your students to analyze. I find that if students use their own data they are more interested in looking at that data for analysis. In this case, every song on iTunes (and really, any platform) has a pile of meta data associated with it. In that meta data are things like song name, artist name, album name but also there are numerical values like song length, file size, number of plays etc. So you could have your students get the data from their own library and do the analysis of it.

Getting the data from iTunes is pretty easy. Once in iTunes, if they want to get the info from all their music then just click on Songs or if they want to get their data from a favourite playlist then they can click on that. Then click on File, then Library, then Export Playlist. It will then send a .TXT file to the folder of your choice. That .txt file will need a bit of cleaning up, but not much. I suggest importing it into Excel or Google Sheets to clean it up. If you are doing the work in that spreadsheet (or uploading to Desmos) then you're all set. If you plan on importing it into CODAP then save the data as a .CSV file (note that I noticed that even though you should be able to import a .TXT file into CODAP, the format of this one doesn't seem to work, so you have to convert it to a . CSV).

Analysis

Though the data itself is not wildly interesting, you can certainly use it to cover topics like mean, median, standard deviation, and other single variable measures. And maybe have students compare values from their playlists to other students. Note, that the time of the songs are in seconds. So if a histogram is created, it is probably appropriate to have bin widths of 30s or 60s (let students figure this out).

One thing that I think is interesting is that you would expect a very strong (if not perfect) relationship between the time of a song and it's file size. But as you can see there seems to be different relationships. This is due to the bit rate of the file compression. So you might be able to have a conversation about what bit rate is and how it relates to the compression of the file. The lower the bit rate the smaller the file size (for songs of the same length). So you could talk about why you would want a lower or higher bit rate (hint: lower bit rate means poorer quality of the sound but smaller file size, so there is a trade off). In CODAP you can create separate graphs of the bit rate data and the scatter plot of the size vs time then high lite parts of the data to show the different relationships. You could actually hide or show data based on the bit rate to do more specific analysis by isolating just the data from one bit rate.

Sample Questions

  • Choose three numerical attributes from your data and determine the mean, median and SD of each. Graph each attribute using an appropriate representation.
  • Which genre of music has the highest average song length?
  • Which song was played the most?
  • Which decade has the most songs?
  • Which song was skipped the most?
  • Determine the relationship between the size of a file and how long the song is for different bit rates. 
  • You have only 50 Mb of space left on your device. How many minutes of music could you store using all of the remaining space (note that answers will vary based on the bit rate.

Downloads

Let me know if you used this data set or if you have suggestions of what to do with it beyond this.

2 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete