How to implement an accurate rating system

Saman · Post by **Saman** » Thu Mar 11, 2010 11:03 am

Many web sites allow users to provide feedback on products, services or other users. In addition to verbal reviews, rating facilities are typically present that allow visitors to rate an item from 0 to 5 (often in conjunction with stars), from 0 to 10, or simply by voting + or -, respectively.

These visitor ratings are then often used to rank the rated items. And when “rank” comes into play, it gets tricky.
Ranking using Bayesian average.

Hopefully the headline hasn’t turned you away yet – it smells of mathematical hardcore. But fear not, once you know how, implementing a robust rating and ranking system using the approach discussed here is really quite simple, very elegant, and most importantly, it works really well!

A basic example using simple + and - votes
Mostly ratings are implemented using simple + and - system. If you like an item, rate it “plus”. If you don’t like it, give it a “minus”.

The rating of an item would then be: number of positive votes divided by number of total votes. For example, 4 + votes and 1 - vote would correspond to a rating of 0.8, or 80%.

Now if you want to rank the items based on this simple equation, the following happens:

Assume you have on item with a rating of 0.93, based on 100s of votes. Now another new item is rated by a total of 2 visitors (or even just one), and they rate it +. Boom, it goes straight to #1 position in the ranking, as its rating is 100%!

A weighty issue
What we want is this:
If there is only few votes, then these votes should count less than when there are many votes and we can trust that this is how the public feels about it. In other texts this value is also referred to as “certainty” or “believability”.

This means, the more votes an item has, the higher the “weight” of these votes.

Thus, we want to calculate a corrected rating that somehow takes the weight of votes into account:

The more votes an item has, the closer the corrected rating value would be to the uncorrected rating value.
The less votes an item has – and this is the main trick here – the closer its rating should be to the average rating value of all items!

That way, new votes pull the corrected rating value away from the average rating, towards the uncorrected rating value.

There you have it – this is the main algorithm of what we call Bayesian rating, or rather Bayesian ranking as it is really about the relation of the item ratings to each other, based on the number of votes of each item.

Using a magic value
We now need to apply a “magic” value that determines how strong the weighting (or dampening, as some consider it) shall be. In other words, how many votes are required until the uncorrected value approximates the corrected value?

It really depends on how many votes the items get, in average. There is no point requiring 1000 votes for the item to rank 60% when each item only gets a handful of votes in average.

Thus, we could make this “magic” value exactly that, namely the average number of votes for all rated items, and voila, our Bayesian rating system is complete. By making the magic value dynamic, it will auto adapt to your system.

Fine tuning the magic value
You could opt to create an upper limit to your magic value so that your doesn’t come to a grinding halt when there are many votes per item – an ever growing magic value would make it less and less possible to actually influence the rating of a new item because it takes so many votes before you believe the rating of a new item.

The fine tuning will depend on whether your system has a large influx of new items or not. If there are many new items added all the time, this influx will keep the average number of votes per item low. If your system has a fixed number of items, such as “rate your favourite star of The Beatles”, you may not need an upper limit. If you do add the occasional item, then an upper limit makes sense to give new items a chance to rate highly more quickly.
Bayesian rating for everyone

Now, lets summarize this all and provide a working formula for you to use in your code:

Bayesian Rating is using the Bayesian Average. This is a mathematical term that calculates a rating of an item based on the “believability” of the votes. The greater the certainty based on the number of votes, the more the Bayesian rating approximates the plain, non-weighted rating. When there are very few votes, the Bayesian rating of an item will be closer to the average rating of all items.

Use this equation:

Code: Select all

br = ( (avg_num_votes * avg_rating) + (this_num_votes * this_rating) ) / (avg_num_votes + this_num_votes)

Legend:
avg_num_votes: The average number of votes of all items that have num_votes>0
avg_rating: The average rating of each item (again, of those that have num_votes>0)
this_num_votes: number of votes for this item
this_rating: the rating of this item

Note: avg_num_votes is used as the “magic” weight in this formula. The higher this value, the more votes it takes to influence the bayesian rating value.

How Bayesian Rating is used
We use it to show the highest rated product in order. We wanted to avoid that a new product with 1 vote immediately jumps to first place, as its rating would be 100%. Using Bayesian rating, its starting rating with one positive vote would be a little bit higher than the average rating of all items.

php example code:

Code: Select all

// Bayesian Rating Calc
$theItem = $_GET[’id’];
if($theItem) {
// all items votes and ratings
$result = mysql_query(”SELECT AVG(item),AVG(vote) FROM itemvotes WHERE vote>’0? GROUP BY item”) or die(mysql_error());
$row = mysql_fetch_row($result);
$avg_num_votes = $row[0];
$avg_rating = $row[1];

// this item votes and ratings
$result = mysql_query(”SELECT COUNT(item),AVG(vote) FROM itemvotes WHERE item=’$theItem’ AND vote>’0?”) or die(mysql_error());
$row2 = mysql_fetch_row($result);
$this_num_votes = $row2[0];
$this_rating = $row2[1];

if(!$row OR !$row2)
$br = “_”;
else
$br = number_format( ((($avg_num_votes * $avg_rating) + ($this_num_votes * $this_rating))/($avg_num_votes + $this_num_votes)), 1, ‘.’ );
} // end of if item selected

See How to use CSS to display star ratings

Asaf · Post by **Asaf** » Tue Jun 28, 2011 11:13 pm

Hi can you please show how the voting table looks like?
is it as follows?

item	vote
1	4
1	3
1	4
1	5
2	3
2	3
2	4
3	3
3	4
3	4
4	2

Thanks

Saman · Post by **Saman** » Tue Jun 28, 2011 11:33 pm

123456asaf, Welcome to ROBOT.LK!

You are correct. Table would looks as that.

Why don't you put a nice username?

This looks like a spam bot and I was about to miss this. You can change it by going to your User Control Panel (Top left link).

mayilgilli · Post by **mayilgilli** » Wed Jun 29, 2011 5:45 pm

Hi,
I have gone through your formula. I have one question if add one more item as 5 and then the rating is 5 based on your formula Item5 will get the rating as 3.92. then now I have calculated the item no 1 then this will get 3.81 still item number 5 is top when i was sorting based on avg ratting. my question is, is this correct logic. I am using your voting table my calculations are bellow please verify if i made any mistake.

For Item no5:
( (2.75 * 3.54) + (1*5) / (2.75+1)) = 3.92
For Item no1:
((2.75 * 3.54) + (4*4) / (2.75 +4)) = 3.81

let me know If i have any mistake. if there is no mistake let me know is this the logic is fine? how?

Asaf · Post by **Asaf** » Wed Jun 29, 2011 6:59 pm

Hello Saman.
Thanks for your reply.
my problem is this, if I have, let’s say 1000 items, and each got 100 votes, I end up with a table of a million rows!!
every time I would like to display the rating of a specific item, I would need to access this table with a million rows and go over all these million rows.

let’s say I decide that 10 vote are enough for me to determine that this is a “believable” rating, how would the code look like?

p.s.
I will update my pic later

Thanks

Saman · Post by **Saman** » Thu Jun 30, 2011 5:52 am

First I guess both mayilgilli and Asaf are same right?

my question is, is this correct logic. I am using your voting table my calculations are bellow please verify if i made any mistake.

For Item no5:
( (2.75 * 3.54) + (1*5) / (2.75+1)) = 3.92
For Item no1:
((2.75 * 3.54) + (4*4) / (2.75 +4)) = 3.81

let me know If i have any mistake. if there is no mistake let me know is this the logic is fine? how?

Yes, it is correct. Basically this algorithm is based on a technique called Bayesian average. On the wiki page it says,

For example, in a calculation of an average review score of a book where only two reviews are available, both giving scores of 10, a normal average score would be 10. However, as only two reviews are available, 10 may not represent the true average had more reviews been available. The review site may instead calculate a Bayesian average of this score by adding the average review score of all books in the store to the calculation. For example, by adding five scores of 7 each, the Bayesian average becomes 7.86 instead of 10, which the review site would hope that it will better represent the quality of the book.

This exactly address your confusion.

The other question...

my problem is this, if I have, let’s say 1000 items, and each got 100 votes, I end up with a table of a million rows!!
every time I would like to display the rating of a specific item, I would need to access this table with a million rows and go over all these million rows.

let’s say I decide that 10 vote are enough for me to determine that this is a “believable” rating, how would the code look like?

1000x100 = 100,000 = 1/10 Million. Though this is simply nothing to a database like mySQL, I understand your point. The logic needs following values.

avg_num_votes - The average number of votes of all items that have num_votes>0
avg_rating - The average rating of each item (again, of those that have num_votes>0)
this_num_votes - number of votes for this item
this_rating - the rating of this item

I guess you can keep some information item wise and some system wide.

To keep this_num_votes and this_rating you could use a table as follows.
ITEM | TOTAL_NUMBER_OF_VOTES | THIS_RATING (average votes for this item)

When someone is adding a vote, you need to perform following operations to the database.
1. Update THIS_RATING field as below.
THIS_RATING = (TOTAL_NUMBER_OF_VOTES x THIS_RATING + user_input) / (TOTAL_NUMBER_OF_VOTES + 1)

2. Update TOTAL_NUMBER_OF_VOTES field as below.
TOTAL_NUMBER_OF_VOTES = TOTAL_NUMBER_OF_VOTES + 1

For 4 items, you will have only 4 records. For 1000 items, you will have 1000 records. You can add these fields to your item table if you like.

Now, you need to use another table (system wide) to keep other two information (avg_num_votes, avg_rating).
NUM_OF_ITEMS | TOTAL_NUM_VOTES | AVG_NUM_VOTES | AVG_RATING

When a user enters a vote, you will have to perform following operations to the database.
1. Update AVG_NUM_VOTES field as below.
AVG_NUM_VOTES = ((AVG_NUM_VOTES x NUM_OF_ITEMS) + 1) / NUM_OF_ITEMS

2. Update AVG_RATING field as below.
AVG_RATING = ((AVG_RATING x NUM_OF_ITEMS) + user_input) / NUM_OF_ITEMS

This will always be a single record.

I think I have addressed your issue.

Asaf · Post by **Asaf** » Thu Jun 30, 2011 4:56 pm

Hi Saman,
first, you guessed wrong, I am not mayilgilli, although it does look like a strange coincidence

.
As to your solution, I will examine it later.

I asked a different thing, I want to make things much simpler, I consider 10 vote as a reliable rating, it doesn't matter to me if I have 1000 votes or 10 (per item), is there some kind of formula like: X * thisItemsRating?

Thanks x 1000.

mayilgilli · Post by **mayilgilli** » Thu Jun 30, 2011 6:07 pm

Hi,
Your logic so for so good. I am satisfy with this . if I have anything I will get back to you

Saman · Post by **Saman** » Thu Jun 30, 2011 6:54 pm

first, you guessed wrong, I am not mayilgilli

Opps...sorry

I asked a different thing, I want to make things much simpler, I consider 10 vote as a reliable rating, it doesn't matter to me if I have 1000 votes or 10 (per item), is there some kind of formula like: X * thisItemsRating?

In that case, you only need a simple rating system. But there is a difference between 2 users who are using 1 product and 100 users using another.

Table you need is as follows
ITEM | TOTAL_NUMBER_OF_VOTES | TOTAL_RATING

The operations you need to perform when user is voting for a product is given below.
1. TOTAL_NUMBER_OF_VOTES field
TOTAL_NUMBER_OF_VOTES = TOTAL_NUMBER_OF_VOTES + 1

2. TOTAL_RATING field
TOTAL_RATING = TOTAL_RATING + user_input

Here is how you need to get the reading for an item.

Code: Select all

br = TOTAL_RATING / TOTAL_NUMBER_OF_VOTES

That's the average vote for given item.

Your logic so for so good. I am satisfy with this . if I have anything I will get back to you

Great. When you say you are happy, that's the only gain for me. Live with ROBOT.LK.

Asaf · Post by **Asaf** » Thu Jun 30, 2011 11:06 pm

Hi Saman,
The "single record" solution you mentioned sounds good.

Thank you very much

How to implement an accurate rating system

How to implement an accurate rating system

Re: How to implement an accurate rating system

Re: How to implement an accurate rating system

Re: How to implement an accurate rating system

Re: How to implement an accurate rating system

Re: How to implement an accurate rating system

Re: How to implement an accurate rating system

Re: How to implement an accurate rating system

Re: How to implement an accurate rating system

Re: How to implement an accurate rating system