规范化评分

规范化评分
Normalizing Ratings

原始链接: http://hopefullyintersting.blogspot.com/2025/05/normalizing-ratings.html

Uber和Lyft的评分系统存在文化偏见和缺乏标准化的问题。在美国，“良好”的乘车体验通常会得到5星评价，以避免惩罚司机，即使服务并非特别出色。这与日本等文化形成对比，在日本，3星被认为是平均水平。这种差异会影响司机的评分，可能损害他们的生计。甚至有一个GPT模型因为持续低评分而停止使用克罗地亚语。核心问题在于缺乏标准化的评分解读。公司应该根据用户的个人评分习惯对评分进行标准化。例如，总是给予5星评价的用户不应该不成比例地抬高司机的分数，而那些总是给予1星评价的用户也不应该不公平地惩罚司机。这种标准化将能够更准确地反映服务质量，并鼓励用户充分利用评分范围，从而提供更有意义的反馈。

Hacker News 的讨论围绕着当前评分系统的缺陷展开。用户抱怨评分系统的“懒惰”平均法，例如 Yelp，单一的负面体验就能扭曲整体评分，而忽略了积极方面。一些人指出，评分常常被用来抱怨物流问题，而不是产品本身。一些替代方案被提出，例如 FourSquare 基于用户行为的加权评分或期望最大化算法。另一些人建议简化评分等级（例如，三星级）或使用字母等级。一个关键点是需要进行标准化以考虑个体用户的偏见。有人建议，根据地区、绝对价格或用户 z 得分的标准化，可以改进评分问题。人们担心公司如果实施复杂的算法，会害怕被指控操纵评分。一位用户承认，自己默认给五星好评，以避免被认为是负面评价。一些人发现，阅读具体的评论，而不是依赖数字评分，更有帮助。

（评论） 2025-04-20

我的微调模型击败了 OpenAI 的 GPT-4 2024-07-02

（评论） 2025-04-27

（评论） 2024-05-01

原文

What rating should you give the driver who just dropped you at your door via Uber or Lyft. Well, unless they messed up in some noticeable way you give them a 5 out of 5 rating unless they messed up somehow. That means you give the same rating to a driver who was just passable as to a driver who was outstanding, but you don't want to unfairly sink the career of the passable driver when everybody knows a 4 star rating is terrible.

In Japan things are different. 3 stars is considered normal so you can give that for a normal ride, and then you can give 4 or 5 stars for an exceptionally good driver. And then you have Eastern Europeans who are reputed to tend to give low ratings except when service is unusually good.

Normally this doesn't matter too much unless you're an American Uber driver who happens to get a Japanese rider who sinks your rating with a 3, but it can have worse consequences. Until a GPT large language model stopped speaking Croatian because it always got a low rating in that language.

But my question is, why don't all these companies normalize the ratings users give? If one person gives nothing but 5 star ratings don't just assume that drivers lucky enough to get that person are better than average. If someone else gives nothing but 1 star ratings don't punish drivers who get them. Let people give whatever rating they want but treat the normal rating from both these people as about a 3 and only treat a rating as unusual when it is unusual.

I'd like to be able to give a 5 for exceptional service on Uber but when a 4 is a punishment then I can't do that in good faith. i'd also like to see the full range of responses in things like book ratings on Goodreatds. This seems like an easy change but I'm genuinely mystified why its not applied anywhere I can see.

规范化评分 Normalizing Ratings

规范化评分
Normalizing Ratings