Hockey Analytics for Beginners

Brett Marshall
19 min readAug 24, 2022

--

The game of hockey has been changing rapidly over the last 10 years and it’s time that the way we watch and interpret what we see starts to change as well.

Thankfully, many smart minds in the hockey world have introduced the public to advanced analytics. Simply put, advanced analytics are statistics that go beyond the box score. They dive deeper into what’s actually happening when a player is on the ice and how he contributes to the outcome of the game.

Why Should I Care About Analytics?

The answer is pretty simple — they will help you become a more educated hockey fan. Whether you watch 10 games a week or just your favorite team, understanding analytics helps to change the way you watch the game.

Another thing analytics can help with is independent thinking. Casual fans lean heavily on what they hear from television broadcasters, local and national journalists and insiders. Not to say these people don’t know hockey, but they can tend to pump players’ tires more (or less) than they deserve, sometimes without consequence.

Analytics can help you identify player’s strengths and weaknesses that the “eye test” may not otherwise notice. They can show the value in players who aren’t always the flashiest or don’t produce a ton and show why other players maybe aren’t quite as good as some claim he is.

So, without out further ado, let’s begin our dive into the world of hockey analytics.

Most Common Advanced Analytics

These next few paragraphs will introduce the most common advanced analytics used today and a brief explanation of what they measure and how to interpret them. Explaining them isn’t always easy, but I’ll do my best. Here’s what we’ll be diving deeper into throughout this series: Shot Attempts, Corsi, Scoring Chances, High Danger Chances, Goals For and Goals Against, Expected Goals, and Goals Saved Above Expected.

There are also comprehensive analytics out there like Evolving Wild’s Wins Above Replacement (WAR), Dom Luszczyszyn’s Game Score Value Added (GSVA) and several others that create a single player value based on all of these stats, but that’s for another time. Today we’ll start simple with Shot Attempts, Corsi, Scoring Chances and High Danger Chances.

Shot Attempts/Corsi

Calling shot attempts an advanced stat is probably a stretch, but a lot of the stats that we’ll learn about throughout this series have shot attempts at the core of their existence.

They’re rather simple to understand. They’re any shot directed at the net. They differ from shots on goal in that they account for shots that are blocked, miss the net, hit the post and ones that do find their way to the net or into the net. If you want to get fancy, you can just call shot attempts “Corsi,” because they’re essentially the same thing.

Corsi simply measures unblocked shot attempt differential at any given point in a shift, period or game. There are few different ways it’s expressed and those include Corsi For (CF), Corsi Against (CA), Individual Corsi For (iCF) and Corsi For Percentage (CF% or C%).

I’m a Wild fan, so we’re going to use them as an example to explain Corsi in a little more detail. Naturally, we’ll use Kirill Kaprizov as our player.

Let’s say Kaprizov has a 45 second shift against the Vegas Golden Knights. The shift starts in the defensive with a VGK faceoff win. The puck goes back to Shea Theodore and he attempts a shot from about the blue line. Kaprizov’s teammate, Jared Spurgeon, blocks the shot before it gets to the net. After the block, the puck squirts to VGK forward Mark Stone who then corrals the puck and fires a shot from just above the top of the circle off the post. The rebound comes back to Stone who then shoots again, this time from right between the hash marks, but the save is made by Kaprizov’s goalie, Marc-Andre Fleury.

In this scenario, even though there was only one shot on goal (Stone’s shot from the slot that was stopped by Fleury), there were three shot attempts. Because Kaprizov was on the ice for these three attempts, he’d receive a “Corsi Against (CA),” of 3.

Now, back to our game. After the Fleury save, Spurgeon gets the rebound and moves the puck up ice with a pass to Mats Zuccarello. He enters the zone and takes a shot from just inside the top of the circles that sails over the net. Kaprizov’s other linemate, Ryan Hartman, races to the corner to recover the puck and shoots one quickly from just above the goal line to try and catch VGK goalie Robin Lehner off-guard. Lehner makes the save and the rebound comes to Kaprizov who shoots from right on top of the crease, but is stopped again by the pad of Lehner. Zuccarello gets this rebound and tries a wraparound, but Lehner answers again and freezes the puck and the Kaprizov line heads to the bench for a line change.

In this scenario, there were 4 shot attempts — Zuccarello’s shot over the net, Hartman’s shot from the corner, Kaprizov’s rebound and Zuccarello’s wraparound. For that shift, Kaprizov had a Corsi For (CF) of 4. We know from the defensive zone that he had a Corsi Against of 3. So in total for the shift, he had a CF 4 and CA 3. Kaprizov in this situation would have an Individual Corsi For (iCF) of 1 as he alone accounted for a single shot attempt.

We can then put these numbers into a ratio to find a Corsi For Percentage (CF%). It’s a simple formula. We take CF and divide it by the sum of CF and CA and multiply it by 100 to get a percent: CF/(CF+CA)*100=CF%.

On this shift, Kaprizov had a CF% of 4/(4+3)*100. Basic math says to do the parentheses first: 4/(7)*100. Then to do multiplication and division left to right, which leaves us with 57.14%. So for that 45 second shift, Kaprizov had a CF% of 57.14%. Easy, right? Over an entire game, a player will accumulate CF and CA to get their CF% for that game. If Kaprizov ended the game with CF of 20 and CA of 14, he’d have a CF% of 58.82%.

You’ll notice that Corsi isn’t necessarily an individual stat. It’s classified as an “On-Ice” stat, meaning every Wild player that was on the ice with Kaprizov on the 45 second shift also had a 57.14% CF%. That said, we can still apply this individually because typically good players and good lines will tend to have more CF than CA because they’re both driving play and shot attempts in the offensive zone while suppressing opportunities and shots in the defensive zone.

Generally speaking, a CF% above 50% is considered good and anything less that 50% is considered bad.

Corsi does have some flaws, though, as it doesn’t take into account the quality of shot. A player in a game could have a CF of 15 and CA of 27 (a CF% of 35.71%), which looks rather poor at a glance, but 20 of those CA could be missed nets and weak wristers from the blue line that are easy saves. Whereas 13 of the 15 CF could have been posts, shots from in close, rush chances and things that very easily could’ve resulted in goals.

Thankfully, we have metrics that take these things into account: High Danger Chances and Expected Goals.

Fun Fact: Where does the term “Corsi” come from? It was named by Tim Barnes, a financial analyst from Chicago who developed the metric after hearing former Buffalo Sabres general manager Darcy Reiger talk about shot differential on the radio. He considered calling it the Reiger number or the Ruff number (after Sabres head coach Lindy Ruff), but didn’t like the ring to either. He settled on “Corsi” after searching for Sabres staff members and stumbling across Jim Corsi, who had had an awesome mustache.

Scoring Chances & High Danger Chances

Now that you understand Corsi, let’s take that up a notch to Scoring Chances and High Danger Chances. These are probably phrases you’ve heard uttered during a broadcast before and once you understand Corsi, are pretty easy to grasp.

Originally defined by War On Ice, a scoring chance indicates shot attempts that are taken from areas of the ice where goals are more likely to be scored (attempts made from the attacking team’s neutral zone or defensive zone are excluded).

A high danger chance is a scoring chance that’s from, well, a high danger area. Where are the high danger areas? Let’s take a look at this chart from Natural Stat Trick for that information.

Ignore the numbers you see here, they’re not that important (if you really want to know more about them, click here). Instead pay attention to the shaded areas. Scoring chances are assigned a “point” value based on where the shot attempt comes from. Attempts from the yellow area get a value of 1, the purplish area attempts get a value of 2 and the aqua area attempts get a value of 3.

A shot attempt within the purplish area or closer is considered a scoring chance (any attempt with a point value greater than or equal to 2). Any shot within the aqua area is considered a high danger chance (any shot attempt with a point value greater than or equal to 3).

Shot attempts can get an additional point if they’re off the rush or are a rebound. Natural Stat Trick defines a rush as any attempt within 4 seconds of any event (turnover, pass, hit, etc.) in the neutral or defensive zone. A rebound is any attempt made within 3 seconds of another blocked, missed or saved attempt without a stoppage of play in between. If a shot is blocked, the value is decreased by 1.

Let’s apply our 45 second Kaprizov shift from earlier to High Danger Chances.

Vegas had 3 shot attempts (or CF 3). Theodore had a shot from just inside the blue line that was blocked by Spurgeon. It was from the yellow area (1 point), but blocked (so -1 point). It was neither a scoring chance nor a high danger chance. Then Stone got a shot from the top of the circle after the block. Though this was a shot from the yellow area (1 point) it was also considered a rebound so it gets an additional point, making it a 2-point shot — good enough for scoring chance, but not a high danger chance. Finally, Stone got his own rebound and took a shot from the between the hash marks that was stopped. This shot was from the purple area, good for 2 points and was a rebound, so it gets an additional point, good for a 3-point attempt making it a high danger scoring chance.

At the other end of the ice, Zuccarello had an attempt from inside the top of the circle off of the rush, that’s a purplish zone plus a rush shot, so a 3-point attempt, which is a high danger chance. Hartman then takes shot from the corner just above the goal line (yellowish area) worth just 1 point, so not even a scoring chance. However, Kaprizov gets that rebound on the top of the crease which by itself is 3 points. Add to that it was a rebound and it was 4-point attempt, making it another high danger chance. Finally, the Zuccarello wraparound comes from a yellow area on the rebound, so that’s 2 points and good for a scoring chance, but not a high danger chance.

To summarize, we know from earlier there were 7 shot attempts on this shift: 4 for the Wild and 3 for the Golden Knights. Of VGK’s 3 attempts, 2 of them were scoring chances and 1 of them was a high danger scoring chance. Of the Wild’s 4 attempts, 3 of them were scoring chances and 2 of them were high danger chances.

Scoring Chances and High Danger Chances are expressed just like Corsi: Scoring Chances For (SCF), Scoring Chances Against (SCA), Scoring Chances For Percentage (SCF%), High Danger Chances For (HDCF), High Danger Chances Against (HDCA), High Danger Chances For Percentage (HDCF%). They can also be expressed for each players individual contributions as Individual Scoring Chances For (iSCF) and Individual High Danger Chances For (iHDCF). They’re also “On-Ice” stats, meaning Kaprizov and his Wild teammate’s on the ice with him during these attempts are all given the same values.

In our made up scenario, Kaprizov’s SCF% is 3/(3+2)*100=60%. His HDCF is 2/(2+1)*100=66.67%. Again like Corsi, these are accumulated over an entire game.

Generally speaking once again, over 50% for both SCF% and HDCF% is considered good because it shows when a player is on the ice, he and his team are generating more scoring chances and high danger chances than they are allowing. Under 50% is considered generally bad.

Both high danger chances and scoring chances can be shown at even strength, 5v5, power play, penalty kill, etc. For consistency, I prefer to examine these stats at 5v5 as that’s where most players play the majority of the time. If you’re looking at these stats, just be sure you know the on-ice situation you’re examining.

High Danger chances can be applied to other metrics, too, like High Danger Goals For/Against (HDGF/A) and High Danger Save Percentage. (HDSv%).They are exactly what you think they are — they measure how often players are converting goals on their high danger opportunities and how many opposing high danger chances are being stopped by goaltenders.

Goals For and Goals Against

Now that you understand shot attempts and high danger chances, let’s move onto Goals For(GF) and Goals Against (GA).

Don’t overthink these ones. They look scary on paper, but when you get down to the core of GF and GA, they’re just a slightly glorified plus/minus. That’s right, GF is just how many goals were scored by a player’s team when he was on the ice and GA is how many goals were scored against a player’s team while he was on the ice. Empty Net Goals aren’t counted.

Similar to Corsi once again, we’ll often see GF and GA expressed as ratio: GF%. It’s found the same way as we found all of the previous stats. GF/(GF+GA)*100=GF%.

Again, generally speaking over 50% is good and below 50% is bad for GF% as being over 50% would indicate that a team is scoring more goals than they are allowing when a certain player is on the ice. Typically this metric is tracked at 5v5 or even strength.

Let’s use another fake scenario with Kaprizov. We’ll assume by the end of the season, Kaprizov will have been on the ice for 98 Minnesota Wild goals for and 70 goals against. This would make his GF% 90/(98+70)*100=53.6%. In the plus/minus world, it’d be expressed as +18.

Here’s where GF can be more helpful than Plus/Minus: Let’s say Adam Beckman gets a cup of coffee this year, plays 15 games, and ends up with 9 goals for and 3 goals against. His plus/minus would be just +6, but his GF% would be 75%. Kaprizov’s +18 looks miles better than Beckman’s +6, but we know +6 isn’t taking the sample size into question. GF% helps contextualize that sample size a bit better as Beckman actually had a better GF% than Kaprizov.

This is one of those stats that I tend to be a bit more cautious with when I look at it. There are lots of external factors that can significantly inflate, or deflate, a players GF% that don’t necessarily correlate to a player’s skill level.

For example, a player could have a really good GF%, something like 59%, but it could be the result of consistently playing with really good players against really bad players. Or vice versa. Top Defenseman on bad teams could have a bad GF% because they’re playing lots of really tough minutes with little support around them, they’re going to give up more goals than an average defenseman on a good team.

Expected Goals

Remember earlier when we talked about Corsi and I brought up this point near the end of the conversation: “A player in a game could have a CF of 15 and CA of 27 (a CF% of 35.71%), which looks rather poor at a glance, but 20 of those CA could be missed nets and weak wristers from the blue line that are easy saves. Whereas 13 of the 15 CF could have been posts, shots from in close, rush chances and things that very easily could’ve resulted in goals.”

Enter my favorite, and what I believe to be the single best advanced metric out there, Expected Goals.

The easiest way to describe Expected Goals (xG) is the statistical chance of an unblocked shot (also known as a Fenwick* shot) becoming a goal. Based on the play-by-play data that’s currently made available to public, xG is the most comprehensive and direct way to measure shot quality that we have available. It’s the core of the idea that not all shots are created equal and some shots have a better chance of becoming a goal than others.

*Quick Note: Earlier we talked about shot attempt differential and learned about Corsi. Fenwick is another very similar statistic, the only difference of note is that Corsi counts blocked shots, Fenwick does not.

Right now there are seven widely referenced public xG models:

  1. Dawson Sprigings Model
  2. Corsica Hockey’s Model created by Manny Perry
  3. Evolving Hockey’s Model created by Josh and Luke Younggren
  4. HockeyViz’s Model created by Micah Blake McCurdy.
  5. Natural Stat Trick’s Model created by Brad Timmins
  6. Money Puck’s Model created by Peter Tanner
  7. Top Down Hockey’s Model created by Patrick Bacon

To create these models, each creator used a similar process. They took a massive data set of about 70% of Fenwick Shots dating back to 2007 (when public play-by-play data was actually made available) expanding all the way to somewhere between 2015 and now, depending on the model, and analyzed everything they could about them.

They were very descriptive with the data collected in these shot attempts, including things like the player shooting and the goalie, location, type of shot, distance from the net, angle to the net, how long players had been on the ice when the shot was taken, and the time between the shot and the last on-ice event (turnover, hit, block, rebound, etc).

They then took this information and see if that particular shot resulted in a goal or not and then compare all of these instances using complex mathematical algorithms to determine what percentage of the same types of shots resulted in goals to produce an xG value.

There are lots of variable at play and exact xG values can differ slightly from model to model, but here are two examples of factors that are most important to determining a shot attempt’s xG value.

Here’s what Evolving Hockey ‘s model emphasizes:

There are some slight differences depending on even strength, power play, penalty kill and if you’d like to examine those in more detail, follow the above link to Evolving Hockey’s explanation of their model.

Here’s what Money Puck’s Model emphasizes:

As you can tell by comparing the two, they are generally the same, but do have some differences across level of importance. So when examining xG data, just be aware of where it’s coming from. I primarily use either Evolving Hockey’s or Natural Stat Trick’s data just because I use their websites the most.

The next note, unlike High Danger Chances or Corsi or Fenwick, we can’t use our eyes to determine the exact value of a shot attempt. As seen above, there are too many variables at play, so we just have to trust these models and understand the values they churn out. That said, we can determine that shots closer to the net will tend to have the highest xG values.

The Applications of xG

Let’s move on to how to understand and interpret xG values into information that’s useful for player evaluation.

As we’ve learned, there’s a complicated mathematical algorithm that spits out xG values for all unblocked shots in a game. These values have become one of my favorite ways to evaluate players.

xG, like every other stat we’ve discussed is primarily expressed in three ways: Expected Goals For (xGF) and Expected Goals Against (xGA) and Individual Expected Goals (ixG). xGF is how many expected goals a player’s team was expected to score when he was on the ice. xGA is how many goals the other team was expected to score when that player was on the ice. ixG is how many expected goals a player just by himself was responsible for. We can create a ratio just like all the other stats for expected goals: xGF%. We find it the same way: xGF/(xGF+xGA)*100.

Let’s use one of the league’s most underrated players as an example here: The Minnesota Wild’s Joel Eriksson Ek (JEEK).

We’ll use his xG numbers from his game against the Los Angeles Kings on Jan. 26, 2021. This was one of JEEK’s most dominate games of the year. In that game, when JEEK was on the ice the Wild totaled 1.20 expected goals for and 0.21 expected goals against.

Obviously you can’t score 1.20 goals or allow 0.21 goals, but it shows JEEK and the Wild had an 85.1% expected goals advantage when he was on the ice and that he and his Wild teammates had a rather high probability of scoring a goal in that game (JEEK did end up with a goal in that game) and had a very low probability of allowing a goal (they didn’t allow one with him on the ice).

This expands as more xGF or xGA is accumulated throughout a game. If a player had an xGF of 1.89 in a game, there’s a high probability that his team should’ve scored twice with him on the ice. If another player had an xGA of 1.48, it’s quite likely the opposing team scored one goal while he was on the ice and had a decent probability of scoring a second one.

Players who are routinely producing xGF% above 50% are often some of the better players on their team. Teams that control higher xGF% in games are usually winning those games. There are scenarios where those don’t hold true, but then we can turn to things like bad luck and bad goaltending (Goals Saved Above Expected) as the culprits.

As you learned earlier, xG can be counted at all strengths. When speaking broadly, I usually prefer 5v5 as it’s when most of the play happens. It can be useful, though, to find which players deliver lots of offensive value on the Power Play and good defensive utility on the penalty kill.

Using Expected Goals in Tandem with Goals For/Against

Sticking with the Wild players we’ve been using in our examples, let’s use some data from the 2021–2022 season for Kirill Kaprizov.

Per Natural Stat Trick, in the 21–22 season Kaprizov had an 5v5 xGF of 48.99 and a GF of 74. His xGA was 41.79 and GA was 51.

As you can see, the Wild score a lot with Kaprizov on the ice at 5v5, scoring 25 more goals than expected. This is found by subtracting the expected goals (48.99) from the actual goals (74) to get 25. On the flip side, he was on the ice for about 10 more goals than expected as well, which in this case can most likely be attributed to subpar goaltending.

This is the main function of using xG and GF/GA together. It can be helpful to find good finishers and who has good luck (scoring more goals than expected) as well as bad finishers and who has bad luck (allowing more goals than expected).

Expected Goals and Goaltending: Goals Saved Above Expected

Another area where xG can be extremely helpful is analyzing goaltending. Goals Saved Above Expected (GSAx or GSAE) is perhaps one of the most comprehensive analytics out there, measuring exactly how many goals a goalie stopped relative to what they were expected to allow.

The baseline here is easy, it’s 0.00. If a goalie has a GSAx at 0.00, it means he saved all the shots he was supposed to, nothing more, nothing less. Anything above 0.00 means he’s going above his call of duty and anything less than 0.00 means he’s hurting his team and not doing his job.

GSAx is just the differential of a team’s expected goals against and actual goals against. For example, say the Wild were expected to allow 2.54 goals in a game, but only allowed 1 actual goal. In this scenario, goaltender Marc-Andre Fleury would have a GSAx of 1.54. On the flip side if they were expected to allow 2.54 goals, but allowed 4 actual goals, Fleury’s GSAx would be -1.46.

GSAx accumulates over a season to get a final number. Last season’s best goaltender was the Rangers’ Igor Shesterkin who racked up a whopping 37.18 GSAx in just 53 games. On the flip side, Seattle’s Phillipp Grubauer was the worst allowing a putrid 31.53 goals more than expected (-31.53 GSAx) in 55 games.

GSAx, in my opinion, is a stat that can stand by itself as a single stat and be used to accurately evaluate a goalie. Generally speaking, if a goalie’s GSAx is above 10 in a season, I’d consider him to be a good starting goalie; 0–10 is an average goalie (1B or a Backup); anything less than 0 I’d consider to be a bad goalie.

That said, it is important, like any stat, to be aware of starts and time on ice and how good or bad the team in front of a goalie is.

Why I Love Expected Goals

Expected Goals are complex, but the results they produce are incredible for player evaluation. They’re great for finding the league’s best offensive players, defensive players, two-way players, power play specialists, and penalty kill masters.

They give us a good indicator of game flow — which team was getting to those high danger areas and producing dangerous shots and which team was being kept to the perimeter and taking shots that most goalies turn away with ease.

You can look at xG rates for different line combinations and defensive pairings to see who’s doing well and who’s struggling and perhaps single out which players bring combinations down and which players are driving those lines.

They help us evaluate goalies by going beyond rudimentary stats like save percentage and goals against average to really the find the goalies that are consistently making the big saves and those that are not.

They give us so much information compacted into a single statistic. That said, Expected Goals aren’t a perfect statistic and they certainly aren’t the end-all be-all.

Pushback on Expected Goals

Like with any sort of data, there’s room for error. All of these models run tests to see how accurate their data is. Remember earlier when we said they analyzed 70% of shot attempt data? They used that other 30% as test data.

They ran their algorithm created from the first 70% on the remaining 30% to see how accurately it predicted how many of the remaining shots resulted in goals. The models were about 76.7 to 79.9% accurate, meaning there still is some room for error and that’s something that needs to be considered.

These errors are mostly out of the model’s control though. They can’t be perfect and they’re based on the public play-by-play data, which has received its fair share of scrutiny as well.

After Analytics #HockeyTwitter came after Blake Wheeler for a poor start to his 2020–21 season (awful xG numbers) despite good offensive production, Winnipeg Jets head coach Paul Maurice came after public models saying that the Jets in-house analytics did not match up with the public was saying.

Though this was probably not entirely accurate, there is some truth to what Maurice said. NHL teams have access to player tracking and play-by-play data that the public doesn’t, likely resulting in their in-house analytics being even more accurate than what we have in the public.

There’s hope that as sports betting continues to grow in the US and Canada, that more of that data will become public and that we can improve these models even further, but until then we have to go with what we’ve got.

___________________________________________________________________

And that’s it! If you’ve made it this far, you now have a basic understanding of some key advanced hockey metrics and analytics. If you like what you learned today, please share this article, follow me on Twitter @B_Marsh92 and tune into Thursday’s to Sound the Foghorn, my weekly podcast with Zeke Boyat and Justin Bakke covering the Minnesota Wild.

--

--

Brett Marshall

Brett is best known on #mnwild Twitter for his PCS/Player Cards and analytics-related breakdowns of the team. He also co-hosts the Sound the Foghorn podcast.