It is currently Tue Sep 02, 2014 3:15 am



Reply to topic  [ 2 posts ] 
 Psypoke Battle Tower Player Ratings - 5 Mar 2014 
Author Message
Gym Leader
Gym Leader
User avatar

Joined: Sat May 12, 2007 6:28 pm
Posts: 723
Location: Toronto
Introduction

Started in Gen 6, the Psypoke Battle Tower has a system for assigning ratings to battlers of the Psypoke community. They are based only on battles between Psypoke members and are a measure of the relative battling skill between one another. They can be a fun way to see how we stack up against each other and perhaps also be used when setting match-ups in subsequent tournaments or in similar applications.

All new players will start with a rating of 450 and it will thereafter increase by some amount after every win and decrease after every loss.

The system used to calculate ratings is the TrueSkill player rating system (see next post for details).

What battles affect a player's rating?

When starting a tournament, the host will decide whether it is a "Rating Tournament". If it is, then the results of all battles conducted in the tournament will affect the participants' ratings. The format of the tournament doesn't matter; only the results of individual battles are important. Anybody participating in a Rating Tournament automatically opts into the rating system.

No other battles will affect a player's rating.

When are ratings updated?

Ratings are updated roughly once per week.

The Ratings (updated 5 March 2014)
Code:
Rank    Username          Rating    Number of Battles
1       ChillBill         1339      5
2       MasonTheChef      1197      12
3       Dare234           1188      12
4       Cherrygrove       989       3
5       Kiga              967       7
6       NastyNati311      883       4
7       MSbold            770       1
8       azul              698       5
9       UberPorpoise      663       5
10      twistedturtwig    624       6
11      sumo12345         487       2
12      EvilPenguin       450       0


Current Rating Tournaments

None

Past Rating Tournaments

Pokemon XY Sky Battle Tournament (Dare234)
Chef's XY Doubles Tournament (MasonTheChef)
X/Y Tournament (ChillBill)

_________________
And what it all comes down to
is that everything's gonna be quite alright


Thu Nov 14, 2013 9:55 pm
Profile
Gym Leader
Gym Leader
User avatar

Joined: Sat May 12, 2007 6:28 pm
Posts: 723
Location: Toronto
--- This post is for informational purposes, explaining the method behind calculating the ratings and also how to use the ratings to compare players. Feel free to skip it. ---

The system used to calculate ratings is the TrueSkill player rating system. How the TrueSkill system works is that each player has two numbers associated with them, the mean rating, μ (mu), and the rating deviation, σ (sigma). μ is an estimate of what your real skill rating is and σ is an estimate of the uncertainty of your rating estimate. Assuming a Gaussian skill distribution, there is 68.2% chance that your real skill rating is within 1σ of μ, a 95.4% chance that your real skill rating is within 2σ of μ and a 99.6% chance that your real skill is within 3σ of μ. The number that I use for the ratings I post, the "trueskill", is μ - 3σ, a conservative rating estimate. It's conservative because there is a 99.8% chance that your real skill rating is higher than the trueskill ratings posted here (and conversely there is a 0.2% chance that your real skill is less than the rating posted here). Basically, given the entire possible range of what your real rating should be (as defined by your μ and σ), I assume that your skill is as low as it can possibly be.

I give each new player a μ of 1500 and a σ of 350 (hence trueskill = 1500 - 3*350 = 450 for a new battler). Your σ will shrink after each battle because I will be more confident about your skill estimate after each performance (and in fact as long as you don't lose every battle, your rating will probably be higher than the base 450 after just a few battles just because of the σ shrink). Your μ will go up or down after each battle depending on whether you win or lose. The amount of change in your μ and σ after every battle will depend on how big your σ is and also on how your numbers compare to your opponent's μ and σ.

In the very long run (after hundreds or thousands of battles), the trueskill of someone who has consistently won about half of all their battles should approach 1500 (μ ≈ 1500, σ ≈ 0). About 0.2% of all battlers will have a μ greater than 2550 and 0.2% will have one below 450.

A rating taken individually has no meaning; it gains meaning only in comparison to ratings of other players. The most basic comparison of course is that if your rating is higher than someone else's it means that you are more skilled at battling than they are (or at least you were the last time the ratings were computed). Such a comparison is of course only valid if the player has had more than just a few battles (all new players, starting with a rating of 450, will be ranked lower than most players who have been battling for a while, just because their σ's are so high). For new players, your rank relative to the other established players should begin to approach what it should be after just 5 battles and after as little as ten battles your rank will be within one or two places of your appropriate rank.

Beyond being able to say that player A is more skilled than player B, I set the system such that a rating difference of exactly 175 means that the higher rated player has an 80% chance to win in a battle. Accordingly, in the chart below I calculated the probability of winning in a battle against someone given how much higher your rating is:

Code:
Rating difference    Probability of winning a battle

0                    50%
10                   51.92%
20                   53.83%
30                   55.74%
40                   57.68%
50                   59.5%
60                   61.35%
70                   63.18%
80                   64.98%
90                   66.74%
100                  68.47%
125                  72.61%
150                  76.47%
175                  80%
200                  83.19%
250                  88.53%
300                  92.5%
350+                 >95%

Note that using this chart is only valid if both players being compared have a low σ (let's say less than 10). After I get some real data I'll state how many battles are needed to get a σ small enough to use this chart.

The TrueSkill system was invented by Ralf Herbrich, Tom Minka and Thore Graepel of Microsoft Research. Here's a link to their original paper although it leaves out a lot of important details if you want to do calculations: http://research.microsoft.com/apps/pubs/default.aspx?id=67956. Here's a friendlier description of the system on the Microsoft Research site: http://research.microsoft.com/en-us/projects/trueskill/details.aspx. Fortunately for us, Jeff Moser was very interested in figuring out the details of TrueSkill that Microsoft Research had not revealed. Here is a link to his blog post about it which has further links to the more detailed math and to C# code that implements the TrueSkill algorithm: http://www.moserware.com/2010/03/computing-your-skill.html. Heungsub Lee ported the code to Python and I use his program to compute ratings. See here: http://trueskill.org/

_________________
And what it all comes down to
is that everything's gonna be quite alright


Thu Nov 14, 2013 9:56 pm
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 2 posts ] 

Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by STSoftware for PTF.