Archive for the ‘Personalization’ Category

CACM Cover Story: Privacy Enhanced Personalization

August 15, 2007

The cover story of this month’s CACM is on privacy-enhanced personalization. Alfred Kobsa (UC Irvine professor and father of the field of privacy-enhanced personalization) devotes most of the piece to surveying privacy-related usability research. Most of the findings seem like common-sense:

  • Users tend to overvalue small, immediate benefits to disclosing personal information and undervalue potential future negative consequences.
  • Users fall into one of three camps: privacy fundamentalists (disclose nothing), the unconcerned (disclose everything), and privacy pragmatists (make sensible trade-offs)–with the two former classes on the decline and pragmatists on the rise.
  • Users value transparency (knowing how personal data will be used) and control.
  • Users will disclose more to established web sites and sites that have a professional appearance and a privacy policy.

No surprises there.

The tail end of the piece–where Kobsa surveys privacy-enhancing technology–is less fluff. The notable points there:

  • Client-side personalization is very limiting. Duh.
  • Allow pseudonymous access if you can. OK.
  • In collaborative filtering systems, perturb input data to hide users’ true values or deliberately introduce noise in the data, so that users can plausibly deny responsability for any potentially embarrassing data. Interesting ideas.

Intriguingly, he mentions a peer-to-peer approach to collaborative filtering that “allows users to privately maintain their own individual ratings, and a community of users to compute an aggregate of their private data…using homomorphic encryption…[and then for] personalized recommendations to be generated at the client side”. However, no citation for this work is provided.

Udi: “Implicit Kicks Explicit’s Ass”

August 1, 2007

There’s a great article over at Udi’s Spot on the superiority of collecting metadata implicitly (i.e. through natural user actions):

The massively important, and often overlooked, thing about implicit metadata is that it’s generally trustworthy. It’s like the results of a double-blind scientific study. Explicit metadata on the other hand, while often useful, is always in doubt. It’s like the results of an exit poll during an election. People lie. People are stupid. People are remarkably un-self-aware. Going the explicit route exposes you to all of these problems.

This has huge implications for how the metadata that feeds recommendation engines should be collected.

Some personal news sites seem to understand this (e.g. Google, Antlook, Findory). Others–like Reddit, Netflix, and Digg–don’t get it. These sites require that users rate content up or down (or, in the case of Netflix, on a scale of zero to five stars).

Explicit metadata is not only unreliable (as Udi points out), it’s also sparse. The click tax is very high and many users will simply not rate at all. Sites that are designed this way are throwing out a tremendous amount of implicit information about users did (and did not do) on their site.