Product ratings can be incredibly helpful to users. During our research studies we’ve observed how the test subjects rely on ratings to gauge a product’s quality or value – especially in verticals where they lacks domain knowledge or have little prior product experience.
However, for users to be able use product ratings this way two key pieces of information must be present: the average rating score (obviously) and the number of ratings that average is based on. Unfortunately, some sites leave out the latter, much to the detriment of their users.
In this article we’ll present both our qualitative and quantitative findings on users’ perception of product ratings. In particular, we’ll investigate how and why most users will show a bias towards slightly poorer ratings if they are based on a higher number of reviews.
During our usability studies on category pages (1), e-commerce search (2), mobile e-commerce sites (3), and most recently product lists in general (4), we have time and again observed test subjects rely heavily on the number of user reviews when evaluating product ratings.
The reason is simple: when users don’t know how many ratings an average is based on, they can’t tell if a perfectly rated product simply has a single 5-star rating, or if its rating average is actually based on hundreds of reviews.
The flipside of this is important to be mindful of: users won’t necessarily consider the product with the highest rating average the best-rated one. Indeed, during our 1:1 usability tests, the subjects often show greater disposition towards some products with 4.5-star averages than some with perfect 5-star ratings due to the number of votes these averages are based on.
For instance, most subjects would pick a sleeping bag with a 4.5-star rating average based on 50 reviews over other sleeping bags with perfect 5-star ratings that were only based on a few reviews – they simply didn’t find the latter to be trustworthy.
So when did the subjects begin finding rating averages trustworthy? During our 1:1 “think aloud” usability tests the number seemed to around 5 reviews. However, we wanted a better idea of whether this behavior was representative of the average e-commerce customer and whether there indeed is a general tipping point for the typical user’s perception of product ratings.
Obviously 1:1 usability studies aren’t good at verifying this sort of thing because the dataset is much too small – what you want to do is take these types of qualitative findings and further test or verify them using quantitative methods. Which is exactly what we did.
We tested three different rating averages against 2,488 people to get a better idea of where the scales begin to tip in regards to the number of reviews a rating average should be based on for the typical user to find it trustworthy.
The quantitative survey results confirm the qualitative findings. For two otherwise identical products, where one product has a 5-star average based on 2 ratings, and the other has a 4.5-star average based on 12 ratings, 70% would pick the one with the higher number of ratings despite its lower average. This confirms the test observations that when a perfect average was based on only a few ratings the subjects would often pick other options with a slightly lower average but a higher number of ratings.
The survey also found this to be just as true when a higher number of ratings were used. In the second survey where users were asked to pick between a 5-star average based on 4 ratings against a 4.5-star average based on 57 ratings, almost the same percentage (74%) would pick the option with the higher number of ratings.
Interestingly, there are significant differences in the bias across different demographics – more specifically age. Younger people (18 - 44) tend to place more faith in averages based on more ratings while older (45+) people show less inclination towards this bias.
In essence, depending on the typical age of a site’s audience, user perceptions of what constitutes a “highly rated product” will differ. Notably, young audiences will show a strong bias towards good-but-not-perfect product ratings that are based on numerous reviews.
Product ratings essentially function as a type of social proof for users, letting them tap into the “wisdom of the crowd”, using good ratings as a proxy for “high quality” or “value for money.” The thinking goes that if a lot of other users are happy with a product it means that it must be a bargain or of high quality – or both. (This is also why users lacking domain knowledge or experience with the product find product ratings particularly useful because it allows them to rely on the domain knowledge and product experience of other customers.)
Displaying the number of ratings an average is based on also seems to be close to a “best practice” among e-commerce sites, with 68% of the 50 top grossing US e-commerce sites getting this right in their product list design. Meanwhile 14% of sites neglect to display the number of reviews next to their rating averages, and 10% don’t show ratings in their product list at all despite collecting them. (The last 8% don’t allow / collect user ratings in the first place.)
It is therefore strongly recommended to include this extra piece of information in the product list – specially considering the negligible amount of space it takes up. Without the number of ratings users – especially young ones – lack essential information about the rating average which renders them unable to determine whether they find the rating trustworthy or not, impeding their ability to gauge product quality and value in verticals where they have little knowledge or experience.
Join 22,000+ readers and get Baymard’s research articles by RSS feed or
Topics include user experience, web design, and e-commerce
Articles are always delivered ad-free and in their full length
1-click unsubscribe at any time
Great research, great article – thanks!
To add fuel to your “strongly recommended” practice of providing the number of ratings upon which the average is based, I’d point out that this isn’t just good for usability, but for search visibility.
That is Google, at least, requires this number (in structured data parlance, a value for the “ratingCount” property) in order to show an average star rating (“aggregate ratings”) directly in the search results.
Interesting, I did not know this, Aaron – thanks for your addition :)
Great article as usual!
I’m curious as to whether there was another influencing factor at play here as well. A larger number of reviews certainly does imply that the average rating is more trustworthy, but it may also give another social proof signal of “popularity”.
An interesting test of this may be to compare choices made between two similarly rated products; one with a decent number of reviews (e.g. 30) and another with many more than that (e.g. 60).
In this case I would not be surprised if, rather than “trustworthiness” being a factor, “popularity” may play a more significant role in the decision.
That definitely sounds plausible Mikel, I’m sure this also comes into play. And again, it requires the number of reviews to be visible in the list item design so the user has sufficient information to make such evaluation.
“However, we wanted a better idea of whether this behavior was representative of the average e-commerce customer ….
Obviously 1:1 usability studies aren’t good at verifying this sort of thing because the dataset is much too small"
EXACTLY! This just proves you don’t have to know the word “dataset” to understand the principle.
There is also the possibility that a visitor who sees a large number of reviews to believe in some circumstances that the product is much older in the market and it can be outdated or obsolete.
it’s a very great research i like the way you have explain
Wow Amazing thank you for share actually these tips is really useful for everyone…
we wanted a better idea of whether this behavior was representative of the average e-commerce custome
What if your shop doesn’t have many reviews on all products? Does it still makes sense to add the number next to it?
Each product typically has 1-2 reviews, with one product having over 23.
My fear is that people will then think all the products with 1-2 reviews are not good products, but they are. They all have the same durability, it’s just they have different designs.
© 2021 Baymard Institute US: +1 (415) 315-9567 EU: +45 3696 9567 firstname.lastname@example.org