These visitor ratings are then often used to rank the rated items. And when “rank” comes into play, it gets tricky.
Ranking using Bayesian average.
Hopefully the headline hasn’t turned you away yet – it smells of mathematical hardcore. But fear not, once you know how, implementing a robust rating and ranking system using the approach discussed here is really quite simple, very elegant, and most importantly, it works really well!
A basic example using simple + and - votes
Mostly ratings are implemented using simple + and - system. If you like an item, rate it “plus”. If you don’t like it, give it a “minus”.
The rating of an item would then be: number of positive votes divided by number of total votes. For example, 4 + votes and 1 - vote would correspond to a rating of 0.8, or 80%.
Now if you want to rank the items based on this simple equation, the following happens:
Assume you have on item with a rating of 0.93, based on 100s of votes. Now another new item is rated by a total of 2 visitors (or even just one), and they rate it +. Boom, it goes straight to #1 position in the ranking, as its rating is 100%!
A weighty issue
What we want is this:
If there is only few votes, then these votes should count less than when there are many votes and we can trust that this is how the public feels about it. In other texts this value is also referred to as “certainty” or “believability”.
This means, the more votes an item has, the higher the “weight” of these votes.
Thus, we want to calculate a corrected rating that somehow takes the weight of votes into account:
- The more votes an item has, the closer the corrected rating value would be to the uncorrected rating value.
- The less votes an item has – and this is the main trick here – the closer its rating should be to the average rating value of all items!
There you have it – this is the main algorithm of what we call Bayesian rating, or rather Bayesian ranking as it is really about the relation of the item ratings to each other, based on the number of votes of each item.
Using a magic value
We now need to apply a “magic” value that determines how strong the weighting (or dampening, as some consider it) shall be. In other words, how many votes are required until the uncorrected value approximates the corrected value?
It really depends on how many votes the items get, in average. There is no point requiring 1000 votes for the item to rank 60% when each item only gets a handful of votes in average.
Thus, we could make this “magic” value exactly that, namely the average number of votes for all rated items, and voila, our Bayesian rating system is complete. By making the magic value dynamic, it will auto adapt to your system.
Fine tuning the magic value
You could opt to create an upper limit to your magic value so that your doesn’t come to a grinding halt when there are many votes per item – an ever growing magic value would make it less and less possible to actually influence the rating of a new item because it takes so many votes before you believe the rating of a new item.
The fine tuning will depend on whether your system has a large influx of new items or not. If there are many new items added all the time, this influx will keep the average number of votes per item low. If your system has a fixed number of items, such as “rate your favourite star of The Beatles”, you may not need an upper limit. If you do add the occasional item, then an upper limit makes sense to give new items a chance to rate highly more quickly.
Bayesian rating for everyone
Now, lets summarize this all and provide a working formula for you to use in your code:
Bayesian Rating is using the Bayesian Average. This is a mathematical term that calculates a rating of an item based on the “believability” of the votes. The greater the certainty based on the number of votes, the more the Bayesian rating approximates the plain, non-weighted rating. When there are very few votes, the Bayesian rating of an item will be closer to the average rating of all items.
Use this equation:
Code: Select all
br = ( (avg_num_votes * avg_rating) + (this_num_votes * this_rating) ) / (avg_num_votes + this_num_votes)
avg_num_votes: The average number of votes of all items that have num_votes>0
avg_rating: The average rating of each item (again, of those that have num_votes>0)
this_num_votes: number of votes for this item
this_rating: the rating of this item
Note: avg_num_votes is used as the “magic” weight in this formula. The higher this value, the more votes it takes to influence the bayesian rating value.
How Bayesian Rating is used
We use it to show the highest rated product in order. We wanted to avoid that a new product with 1 vote immediately jumps to first place, as its rating would be 100%. Using Bayesian rating, its starting rating with one positive vote would be a little bit higher than the average rating of all items.
php example code:
Code: Select all
// Bayesian Rating Calc
$theItem = $_GET[’id’];
if($theItem) {
// all items votes and ratings
$result = mysql_query(”SELECT AVG(item),AVG(vote) FROM itemvotes WHERE vote>’0? GROUP BY item”) or die(mysql_error());
$row = mysql_fetch_row($result);
$avg_num_votes = $row[0];
$avg_rating = $row[1];
// this item votes and ratings
$result = mysql_query(”SELECT COUNT(item),AVG(vote) FROM itemvotes WHERE item=’$theItem’ AND vote>’0?”) or die(mysql_error());
$row2 = mysql_fetch_row($result);
$this_num_votes = $row2[0];
$this_rating = $row2[1];
if(!$row OR !$row2)
$br = “_”;
else
$br = number_format( ((($avg_num_votes * $avg_rating) + ($this_num_votes * $this_rating))/($avg_num_votes + $this_num_votes)), 1, ‘.’ );
} // end of if item selected