Since I moved here, I have been thinking about starting a series of blog posts on data visualization – design, implementation, libraries, data etc. Now I’m finally ready to start the game.
Throughout the series, I will be working on my favourite data, statistics from NHL games. All data will be retrieved from NHL.com’s game reports like this. There are couple of reasons why I want to use these reports: 1) it is easy to retrieve all the urls for given season and 2) it is easy to scrape what ever data I want to get out of them since they are quite uniform. Since the table cells are not well named, there is some manual cleaning and defining to do.
From each game report, we can retrieve information on goals, officials, 3 Stars awards, goalie performance and even actual lengths of penalties instead of just number of penalty minutes given.
On Python, I’m not yet sure which library I’m going to use, since I’ve used a forked version of Pygal which is more advanced than original and I have only done some exploratory data analysis on matplotlib so it will also be great way to learn the strengths of those libraries.
In addition to just showing what the libraries can do, I am interested on what kind of default settings they have (plotting something without touching any styles), how easy it is to customize, do they provide any interaction options and what kind of ideas they are based on.
These blog posts will also be a background research for my bachelor’s thesis in case I end up writing it.
I do not yet have fully planned this to the end but I have a some idea how things are going to roll out.
- Data parsing and cleaning
- Design and theory behind data visualizations
All code created will be published on MIT Licence in GitHub.