Visualising NHL winning and losing streaks

As Chicago Blackhawks break records of longest point streak in NHL this season (currently 21 games from the beginning) and my favourite team Pittsburgh Penguins are playing like a rollercoaster, I decided to take a look at my favourite data of them all – NHL stats.

The visualisation at its most current state can be found here. The most recent code can be found on BitBucket.

Winning and losing streaks

NHL.com’s standings page shows the latest streak but since I was interested in all streaks during the season, I did some digging and found this page containing all the games in chronological order with consistent and easy-to-parse table format.

First, to parse the data I used following code:

def readData(loadFromWeb=False):
    teams = defaultdict(list)
    if not loadFromWeb:
        with open('nhl.json') as jsonfile:
            teams = json.load(jsonfile)
    else:
        for i in range (1,PAGENUMBER+1):
            pageurl = "%s%s" % (baseurl, i)
            soup = bs(urllib.urlopen(pageurl))
            all_tables = soup.findAll('table', { 'class' : 'data stats' })[0].find('tbody')
            trs = all_tables.findAll('tr')
            for tr in trs:
                team_success = {}
                tds = tr.findAll('td')
                home_team = tds[1].string
                away_team = tds[3].string
                home_goals = tds[2].string
                away_goals = tds[4].string
                home_win = int(home_goals) > int(away_goals)
                away_win = not home_win
                if home_win:
                    teams[home_team].append('W')
                    teams[away_team].append('L')
                else:
                    teams[home_team].append('L')
                    teams[away_team].append('W')
    print max([len(matches) for team, matches in teams.items()])
    return teams

It provides possibility to either read data from JSON or from the actual page and it creates a dict with team names as keys and list of Ws (win) or Ls (loss). After that, the list is transformed to only acknowledge wins and losses that are in a streak:

def transform(teams):
    transformed = defaultdict(list)
    for team in teams.keys():
        games = teams[team]
        for i in range(0, len(games)-1):
            if games[i] == games[i+1] or games[i] == games[i-1]:
                if games[i] == 'W':
                    transformed[team].append('W')
                elif games[i] == 'L':
                    transformed[team].append('L')
            else:
                transformed[team].append('')
        # Last game
        if games[-2] == games[-1]:
            if (games[-1] == 'W'):
                transformed[team].append('W')
            else:
                transformed[team].append('L')
        else:
            transformed[team].append('')
    return transformed

The data can then be written to either JSON or HTML. In the Git repo there are also files for HTML head and HTML tail which I combine with the script-written HTML to create the website. The visualisation can be found at my website.

Originally I was going to do the visualisation with D3.js or ggplot2 but after prototyping it with HTML/CSS, it looked quite good actually and I decided to leave it like that for now to keep a personal note that you can actually do quite something with just background-colored table cells.

The whole code can be found here

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s