--- This post is for informational purposes, explaining the method behind calculating the ratings and also how to use the ratings to compare players. Feel free to skip it. ---
The system used to calculate ratings is the TrueSkill player rating system. How the TrueSkill system works is that each player has two numbers associated with them, the mean rating, μ (mu), and the rating deviation, σ (sigma). μ is an estimate of what your real skill rating is and σ is an estimate of the uncertainty of your rating estimate. Assuming a Gaussian
skill distribution, there is 68.2% chance that your real skill rating is within 1σ of μ, a 95.4% chance that your real skill rating is within 2σ of μ and a 99.6% chance that your real skill is within 3σ of μ. The number that I use for the ratings I post, the "trueskill", is μ - 3σ, a conservative rating estimate. It's conservative because there is a 99.8% chance that your real skill rating is higher than the trueskill ratings posted here (and conversely there is a 0.2% chance that your real skill is less than the rating posted here). Basically, given the entire possible range of what your real rating should be (as defined by your μ and σ), I assume that your skill is as low as it can possibly be.
I give each new player a μ of 1500 and a σ of 350 (hence trueskill = 1500 - 3*350 = 450 for a new battler). Your σ will shrink after each battle because I will be more confident about your skill estimate after each performance (and in fact as long as you don't lose every battle, your rating will probably be higher than the base 450 after just a few battles just because of the σ shrink). Your μ will go up or down after each battle depending on whether you win or lose. The amount of change in your μ and σ after every battle will depend on how big your σ is and also on how your numbers compare to your opponent's μ and σ.
In the very long run (after hundreds or thousands of battles), the trueskill of someone who has consistently won about half of all their battles should approach 1500 (μ ≈ 1500, σ ≈ 0). About 0.2% of all battlers will have a μ greater than 2550 and 0.2% will have one below 450.
A rating taken individually has no meaning; it gains meaning only in comparison to ratings of other players. The most basic comparison of course is that if your rating is higher than someone else's it means that you are more skilled at battling than they are (or at least you were the last time the ratings were computed). Such a comparison is of course only valid if the player has had more than just a few battles (all new players, starting with a rating of 450, will be ranked lower than most players who have been battling for a while, just because their σ's are so high). For new players, your rank relative to the other established players should begin to approach what it should be after just 5 battles and after as little as ten battles your rank will be within one or two places of your appropriate rank.
Beyond being able to say that player A is more skilled than player B, I set the system such that a rating difference of exactly 175 means that the higher rated player has an 80% chance to win in a battle. Accordingly, in the chart below I calculated the probability of winning in a battle against someone given how much higher your rating is:
Rating difference Probability of winning a battle
Note that using this chart is only valid if both players being compared have a low σ (let's say less than 10). After I get some real data I'll state how many battles are needed to get a σ small enough to use this chart.
The TrueSkill system was invented by Ralf Herbrich, Tom Minka and Thore Graepel of Microsoft Research. Here's a link to their original paper although it leaves out a lot of important details if you want to do calculations: http://research.microsoft.com/apps/pubs/default.aspx?id=67956
. Here's a friendlier description of the system on the Microsoft Research site: http://research.microsoft.com/en-us/projects/trueskill/details.aspx
. Fortunately for us, Jeff Moser was very interested in figuring out the details of TrueSkill that Microsoft Research had not revealed. Here is a link to his blog post about it which has further links to the more detailed math and to C# code that implements the TrueSkill algorithm: http://www.moserware.com/2010/03/computing-your-skill.html
. Heungsub Lee ported the code to Python and I use his program to compute ratings. See here: http://trueskill.org/
And what it all comes down to
is that everything's gonna be quite alright