The Elo rating system is the system used for evaluation and comparison of competitors. Up until today it's been mostly applied in the domain of board games, most well-known in chess, but also in disciplines such as draughts or go. The Elo system, named after its inventor, Prof. Arpad Elo, who first published it in the 1950s in the US, is capable to produce a reliable score expectation for an encounter between two competitors who oppose each other.

For those who are not familiar with chess or draughts, let's take a look on how the Elo ratings work:

1) In an encounter between two competitors, A and B, assume they have ratings R

_{a}and R

_{b}.

2) There is a function that maps the expected result for each player given the opponent:

E

_{a}= F(R_{a}, R_{b})
E

where F is a monotonic non-decreasing function bounded between minimum and maximum possible scores, such as 0 and 1 in chess. An example for such a function would be _{b}= F(R_{b}, R_{a})*arctan(x)/π + 0.5*.

E

_{a}+E

_{b}should be equal to maximum possible score.

In practice a non-analytical table-defined function is used that relates only on the difference between Ra and Rb, and not their actual values. The function can be reliably approximated by the following expression:

*E = 1 / [ 1 + 10*

^{(Rb-Ra)}/ 400 ]which works well with ratings in low 4-digit numbers and rating changes per game in 0-20 range.

3) After the encounter, when real scores S

_{a}and S

_{b}have been registered, the ratings are adjusted:

R

_{a1}= R_{a}+ K*(S_{a}-E_{a})
R

_{b1}= R_{b}+ K*(S_{b}-E_{b})
Where K is a volatility coefficient, which is usually higher for participants with shorter history, but ideally it should be equal for both participants. The new ratings are used to produce the new expected results and so on.

The Elo rating has several highly important properties:

1) It gravitates to the center. As rating R of a participant climbs higher, so does the expected result E, which becomes difficult to maintain, and a failure to maintain it usually results in a bigger drop in the rating.

2) It's approximately distributive. If we gather N performances and average the opponents as R

_{av}, the expected average performance as E

_{av}= F(R

_{a}, R

_{av}), and the actual performance as S

_{av}, then the new rating R

_{aN'}= R

_{a}+ N*K*(S

_{av}-E

_{av}) will be relatively close to R

_{aN}obtained via direct R

_{a}reciprocal update after each of the N games.

3) It reflects tendencies, but overall performance still trumps it. Given the three players with ten encounters against other players with the same rating, when the performances are (W - win, L - loss):

For player 1: L,L,L,L,L,W,W,W,W,W

For player 2: L,W,L,W,L,W,L,W,L,W

For player 3: W,W,W,W,W,L,L,L,L,L

player 1 will end up with the highest rating of the three, player 2 will be in the middle, and player 3 will have the lowest one - but not by a very big margin. Only when the streaks become really long the Elo of a lower performance may overcome the Elo of a higher one.

We try to apply the Elo ratings to a variety of estimations in hockey, which can be divided into three groups:

1) Straightforward: the events where the outcomes are clear win-loss between competitors, such as penalty shots and faceoffs.

2) Indirect: the events where the outcomes are clear win-loss, but the competitors aren't, such as coaches and teams as a whole.

3) Complex: the events where a participant, i.e. the hockey player just tries to achieve the best possible result, and the hockey team as a merge of these players.

We are not done yet presenting all our ideas on the website, so make sure you come here frequently to see what's new.

More Hockey Stats | Hockey Elo Ratings | NHL Errata |
---|