Tuesday, April 01, 2008

Popularity vs. Preference

One of the things that dawns on you after a while working with music recommendations is that popularity and individual preference are two ends of a spectrum. Say for example that you have N votes to allocate among the N musical artists in the known universe. How you allocate those votes is one expression of your personal preference. If you take everybody's votes and add them all up, you get popularity, which is an expression of the preference of the population.

This is essentially the model of sites like Digg and Reddit too, which percolate submissions to the top of a list based on the number of people that vote for them in a certain span of time. The model works pretty well when precautions are taken to prevent people from gaming it. On the other hand, it's not really recommending things in a personalized way even if it seems like it sometimes because the user base of such sites is overwhelmingly biased toward libertarian uber-geeks.

In a sense music recommendation systems work somewhere between the two ends of the spectrum, trying to find a subset of listeners similar enough that it's safe to assume they have common tastes, but not so similar that there's nothing left outside of the intersection of their preferences. This happens in practice sometimes also, by grouping users or items or both into related sets (i like A, A is related to B, so recommend B; or i like A, she likes A and B, so recommend B).

I'm not a big fan of this approach in music recommendation because it seems both self-limiting and subject to feedback loops. But it's kind of interesting in a personalized radio context. For example, if you're building a station around a particular artist you'll present tracks by that artist as well as tracks by related artists. You'd expect that any listener who has requested the initial artist would like the presented tracks about as well as any other listener. But the variation among listeners even in this narrow context is fairly drastic, especially with bands that have a large catalog. One notable example is R.E.M., a band that clearly has two sets of fans-- pre-Out Of Time and post-Out of Time. In the case of radio where you're presenting a set of tracks that is supposed to be personalized to a particular type of listener, the variation of track popularity among users is rapid feedback on the quality of your recommendations.

No comments: