Author Archive

Netflix Prize Contestants Stuck at 7.8%

August 6, 2007

The Netflix prize turned 300 last week and it seems that progress is
slowing to a crawl. To win the prize, participants have to beat
the predictive accuracy of Netflix’s own Cinematch algorithm by 10%.
Tom Slee reports:

The Cinematch score was matched within a week. Within a month the leaders were half way to the winning prize with a 5% improvement. But getting further improvement progress has proved more and more difficult. It took another month to get to a 6% improvement, about 5 more months to get to 7%, and the current (July 29 2007) leader is at 7.8% improvement and has been unchanged for a month

Tom goes on to expose some curious outliers in the data and expresses
skepticism that recommendation systems can unmask the wisdom of
crowds.

Udi: “Implicit Kicks Explicit’s Ass”

August 1, 2007

There’s a great article over at Udi’s Spot on the superiority of collecting metadata implicitly (i.e. through natural user actions):

The massively important, and often overlooked, thing about implicit metadata is that it’s generally trustworthy. It’s like the results of a double-blind scientific study. Explicit metadata on the other hand, while often useful, is always in doubt. It’s like the results of an exit poll during an election. People lie. People are stupid. People are remarkably un-self-aware. Going the explicit route exposes you to all of these problems.

This has huge implications for how the metadata that feeds recommendation engines should be collected.

Some personal news sites seem to understand this (e.g. Google, Antlook, Findory). Others–like Reddit, Netflix, and Digg–don’t get it. These sites require that users rate content up or down (or, in the case of Netflix, on a scale of zero to five stars).

Explicit metadata is not only unreliable (as Udi points out), it’s also sparse. The click tax is very high and many users will simply not rate at all. Sites that are designed this way are throwing out a tremendous amount of implicit information about users did (and did not do) on their site.

And We’re Off…

July 31, 2007

What I hope to accomplish in this blog is to cover the emerging space of personalized web services. No, I’m not talking about the lame portals of yesteryear that allowed you to build your very own portal page with your very own selection of news, sports, finance, and weather feeds! I’m talking about services that automatically build profiles of each user’s tastes and interests, using them to create personalized content.What began as research, has been widely deployed by ecommerce sites (e.g. Amazon, NetFlix) to drive product recommendations, is increasingly being used to create personal news services (e.g. Google Personal News, Findory, Antlook) and influence search results.