I will start quoting the scientific article where authors mention IRB system applied first to rugby games – “International Rugby Board or IRB system employs a predictor/corrector adjustment in which defeating a weak team provides less gain than defeating a strong team while losing to a weak team elicits a much larger negative adjustment than losing to a strong team, arguably a fair and efficient methods for rating competitors”.
But next sentences – application of football – were those that persuaded me to develop one system, based on the work of hungarian mathematician Arpad Elo. Sentences say: “FIFA have improved the previous rating systems with a new and simpler system which takes into account strength of opponents and game importance; however, all losses are treated as equal regardless of the opponents, and home advantage is ignored. An Elo based system, employing many of features of the IRB system, appears to have advantages over the FIFA system“. Rest of the work is available at Football Rating Systems for Top-Level Competition: A Critical Survey; Ray Stefani and Richard Pollard; California State University, 2007.
But there are different PROS and CONS of ELO-based system.
ELO based system
In nutshell, Arpad Elo was targeting the development of the system to quantify the chess players performance. In chess, differently to football or any other sport, actual performance do NOT change that much. Also chance of drawing is way different than in football or in basketball.
But, the outcome of the work of is the unique rating system with wide usage – in football, basketball, Mayor League Baseball, online games and much more.
The difference in the ratings between two players serves as a predictor of the outcome of a match.
2 players with the same ratings who play against each other are expected to score an the same number of wins. A player whose rating is 100 points greater than their opponent’s is expected to score 64%; if the difference is 200 points, then the expected score for the stronger player is 76%; if the difference is 300 points, then the expected score is 85%. Difference greater than 735 poinst means 100% points – all wins for better player.
But there is difference between expected points and results. Chess employs easy system where win means 1 point, draw 1/2 point for each of the parties and loss means 0 points for losing side. Football used to give 2 points to winner and 1 for draw, but later on 3 points are being awarded to winning side, making football attractive. But for mathematic purpose we stick to chess system. Why?
One of the principle in understanding the ELO system is the sharing of the points.If player rated 2500 plays player rated 2200, the expectation is that stronger player gets 85% of points. In reality if they play 10 games, better player gets 8.5 points and worse 1.5 points. BUT the rating DOESN’T say how many draws will be there. There could be 8 wins, 1 draw and 1 lose from better player’s perspective or 7 wins, 3 draws and 0 losses too.
Draw – the challenge of computation
Some systems calculate draws out of previous results in %, taking that many per cents of points off and “contributing them to the kitty”. Later on, those points are being shared by 2 parties. On top of that wins/loses make the difference.
At the beginning, there should be boundaries set. For instance 1000 can be rating of newbie, 2000 of better club player and 2400 of a professional.
Out of the base we can define the movement of rating as:
where Player A was expected to score points but actually scored points. Player A had a rating of . The only tricky parameter here is K.
For instance chess federation FIDE uses following K-factor values (“FIDE Online. FIDE Rating Regulations effective from 1 July 2014”. Fide.com.):
- K = 40, for a player new to the rating list until the completion of events with a total of 30 games and for all players until their 18th birthday, as long as their rating remains under 2300.
- K = 20, for players with a rating always under 2400.
- K = 10, for players with any published rating of at least 2400 and at least 30 games played in previous events. Thereafter it remains permanently at 10.
As it can be seen, the bigger the K is, the more player can gain or lose. The bigger number is also naturally advised to be used in sport, especially in football. That was also the direction in which I was going and Monte Carlo emulation proved me that I was right.
When it comes to propabilities of movements and outcomes, following picture says it all for K=16 and K=32:
Differences of rating and expected results
I have organized small Facebook quiz and 30 of my friends took part in this quiz. Out of them some play chess, some do know ELO from sport and some have no clue about it. Below are nice results basically proving that this rating system is self-explanatory and with proper examples given, people can easily figure out the basic facts.
Question 1 – Do you know what is ELO (ELO rating system)?
YES (36.7%), I have head about that (10.0%), NO (53.3%)
Question 2 – If chess player rated 2500 faces opponent rated 2200, how many percents of games will he win (how many % of points will be his?)
below 50% (20.0%) = not understood at all, maybe they thought about inverted rank
51 to 60% (16.7%)
61 to 70% (23.3%)
71 to 80% (10.0%)
81 to 90% (23.3%) = correct answer
91 to 100% (6.7%)
Question 3 – if newbie is rated 1000, league player 2000, then the best player is ranked:
2500 to 3000 (56.7%) = correct answer
3001 to 3500 (26.7%)
3501 to 4000 (16.7%)
The Ratings are well-adapted to sport, chess, board games and other human and non-human competitivee disciplines (robots).
They help to set initial expectations and of course they help in predictions. But the rating is not the only factor, especially in humans. We are fragile, influenced by outer world and under different circumstances we act differently. And that is the magic of sport – even the biggest underdog can win some big tournament here and there. Remember Greece football team at Euro 2004?