Wednesday, September 19, 2007

Bayesian Statistics

As i've moved from one domain to another during my computer software career, there are certain ideas that seem to follow along. The calculation of eigenvalues and eigenvectors pops up in many different places, as does singular value decomposition. Markov chains have shown up in everything from statistical mechanics to language analysis. I've used K-means clustering in drug discovery and for music personalization.

Another idea that seems to be universally applied these days is Bayesian statistics. When i first encountered the idea a decade or so ago, it was a fairly obscure concept known primarily to statistics geeks, but Paul Graham gave the Bayesian approach a boost when he suggested using it for spam filtering.

Conceptually, the idea of Bayesian statistics is fairly easy to comprehend: evidence (or belief) should factor into probabilities. An oft-cited example is the probability that the sun will rise tomorrow. Normal statistics doesn't say much about that probability, but the Bayesian approach contends that its highly likely because the sun has risen reliably for quite some time and so there's reason to believe that it will tomorrow. In practice, i don't find this example to be very useful. Another common example that i like better is this: suppose you have to determine the probability that somebody has cancer. With no further evidence of the person's habits, you could guess the probability by simply determining the percentage of people who have cancer among the whole population. But if you know that the person is a smoker, you might expect this probability to be higher.

So Bayesian probabilities are essentially conditional probabilities (the probability of event A given event B, usually written P(A/B). The so-called Bayes formula expresses this conditional probability like this:

P(A/B) = P(B/A) * P(A)/P(B)

This says the probability of event A given that event B has happened is equal to the probability of event B given that A has happened times the probability of event A divided by the marginal probability of event B. Which is completely non-intuitive to me.

Another way of thinking about this is that the probability of event A given event B is different that the probability of event A without the condition of event B by the factor P(B/A)/P(B). So for example, the probability that somebody has cancer given that they're a smoker, P(cancer/smoker), is different than the probability that they have cancer, P(cancer), without the evidence of smoking by the factor P(smoker/cancer)/P(smoker). If the latter factor is >> 1 then the idea that smokers are more likely to have cancer is supported.

This is still not particularly intuitive to me, but let's look at it a bit. Take the extreme case that P(smoker) = 1, that is, everybody smokes. Then this factor can at most be 1.0, so the knowledge that somebody is a smoker is not helpful. Similarly, if you were to poll a population of cancer patients and find that they were all smokers, then the numerator of this factor is 1. If P(smoker) is less than 1, then this factor tells us that being a smoker definitely influences the probability of cancer. In Bayesian terms the probability P(cancer/smoker), which is called the posterior probability, is greater than P(cancer), which is called the prior probability.

At a glance, it doesn't seem like the above equation does much for you, since the thing you're trying to find, P(A/B), has to be calculated in terms of an equally complex term, P(B/A). The trick is that P(B/A) is often easier to determine from known data. Put in terms of scientific method, it's often easier to determine the probability of the evidence given the truth of the hypothesis than to determine the probability of the hypothesis given the evidence.

An example from recommendation technology would be something like this: suppose that you have a large collection of ratings for both albums and artists, with values of 1 to 4. You want to determine the probability that a given user would give an artist S a 4 rating, P(S=4). Without any other information, you could estimate this probability by looking at the percentage of 4 ratings for that artist among the entire set of ratings.

However, suppose that the user has not rated the artist, but has rated one of the artist's albums. The value of that album rating can be considered evidence of the user's opinion about the artist. Let's say the rating for the album is M, then we can look at P(S=4/M) . Bayes formula gives us

P(S=4/M) = P(M/S=4) * P(S=4)/P(M)

Here, the P(M) term is more complicated than it looks. I glossed over the fact above that the denominator of this formula is a marginal probability. To understand the origin and concept of a marginal probability, it helps to know the relationship between conditional and joint probabilities. A joint probability is the probability of two events occurring together, usually written as P(A,B). Joint probability can be written in terms of conditional probability

P(A,B) = P(B/A) * P(A)

where P(A) is a marginal probability, meaning that it's the probability of the event A occurring regardless of event B. The reason it's called marginal is because if you consider a table listing the probabilities of all the possible joint events (assuming A and B take on discrete values), then if you sum up the joint probabilities across one row (meaning the total probability for a given value of A regardless of the value of B), then you'd write the sum in the margin. Get it? The marginal probability for P(M) can be written like this:

P(S,M) = P(1,M) + P(2,M) + P(3,M) + P(4,M)

or

P(M/S=1)*P(S=1) + P(M/S=2)*P(S=2) + P(M/S=3)*P(S=3) + P(M/S=4)*P(S=4)

The formula now is composed of terms that we can derive from the ratings data. Given an actual value for the rating M for a particular user, we could then predict the mostly likely artist rating for that user.

This is a contrived example, and i wouldn't really recommend it as a method for doing music recommendations, but i hope it does illustrate how this concept can be applied in various domains. There are far more interesting applications of Bayesian statistics in machine learning, the physical sciences, and even search and rescue.

Surprise in Recommendations

There's a principle in user interface design called "The Principle of Least Surprise" (or astonishment), which states that any action with a potentially ambiguous interpretation should result in the the least surprising consequence. Often the results of recommender systems are interpreted similarly: people evaluating the recommendations deem a recommendation to be good if it is expected. Once, long ago, i wrote here about how this doesn't seem to provide a good experience with music recommendations, because if a recommendation is not surprising, it is probably already familiar.

I've since decided that a bigger problem with music recommendations is that even great recommendations can't be appreciated unless you listen to the music (again, if you already know the recommended music, then it's not a compelling recommendation unless it's in the context of something like a personalized radio stream). However, i still think about this problem occasionally. It still seems to me that the interesting recommendations are those that you don't expect, but which still match your taste (if you eliminate the latter restriction, it's easy to make unexpected recommendations).

The main reason why unexpected recommendations are rare is (i think) because most recommendation systems are based on some measure of similarity between either items or users (this is the idea behind so-called collaborative filtering systems and content-based systems obviously seek to find item similarity). Often the set of recommended items will be chosen by comparison with similar items or by comparison with the tastes of similar users. So suppose that you like The Shins and the system discovers that other people who like the Shins often also like the The Decemberists. The latter is a good recommendation by most standards (including my personal subjective standards). But it is not a surprising recommendation.

Surprising in this context does not mean obscure like say Neutral Milk Hotel, or outrageous like say Iron Maiden. For me a surprising recommendation would be something that maybe takes a detour into a different, but adjacent genre. For example, i've been listening to a lot of Richard Thompson recently, and i'd be surprised but pleased if there were a connection from The Shins to Thompson via the intersection of indie rock with alt. country and alt. country with folk. It's unlikely that many systems would make that connection because the two artists don't have a significant shared fan base (though i'm sure it's larger than just me).

A mathematical model of surprise was developed a while back, but it treats surprise more in the sense of jumping-out-of-the-bushes rather than one-of-these-things-is-not-like-the-other. The idea of surprise that i have in mind is more like Ted Dunning's use of the word in his paper Accurate Methods for the Statistics of Surprise and Coincidence, in which surprise is more of a rare but significant co-occurence.

The reason why even the latter approach doesn't often produce the sorts of surprises that i want is that the information isn't in the data. For example, the connection that i claim between The Shins and Richard Thompson above is subjective. In the general population of music listeners there simply isn't enough data to establish a connection between the two artists. It seems like what is needed is something that can infer or enhance these unusual connections from a single person's listening habits. Or, perhaps, this connection could be found with content-based recommendation systems or some future refinement thereof.

I suppose that if you imagined a giant network that connects every musical artist with closest neighbors and so on until every pair of artists (A,B) are connected by some path, then you could over time weight the connections based on an individual user so that certain paths become "shorter". It might even be possible to structure this as a Bayesian network, where information about a user's artist preferences is used as evidence to connect previously unrelated artists for that individual. But, honestly, i'm not sure how you'd scale this across millions of users.

Monday, September 17, 2007

Man Sues God

This article about Nebraska State Senator Ernie Chambers looks like an Onion article, but apparently is not.

Thursday, September 13, 2007

What Not To Sing In An Airport

You know how the last song you hear on the radio before getting out of the car sticks in your head and you end up singing it to yourself. Last weekend i flew to San Francisco for the Plum Blossom Federation Kung Fu Tournament and as I arrived at the airport the new Beck song was playing on the radio. So as i walked into the airport, i was singing

We got a time bomb, we got a time bomb,....

Fortunately, i realized where i was before i reached the security checkpoint.

Tuesday, September 11, 2007

Yo Gabba Gabba!

I didn't want to post something serious for 9/11 and sound all solemn, even if this is a sort of unofficial memorial day for us Americans. So i decided to post something about the best children's television show ever made, Yo Gabba Gabba!

Now, the interesting thing is that Yo Gabba Gabba! is shown on Nick, Jr. and is clearly targeted at preschool, Sesame Street-age kids. However, it's very popular with my 13 year old son and his friends. My son is also a big fan of the Aquabats, some of whose members are associated with the show. They've also had musical guests like the Aggrolites and The Shins.

Quite frankly, the show is brilliant. If you don't find yourself nodding along with the tunes about straightening up your room and other wonderful advice, you probably are lame in other important self-limiting ways. It's one of those shows that appeals to adults and its target audience by being quirky and fun without ever crossing over into creepy.

Watch it. It's really awesome, and it proves that the terrorists have already lost.

Tuesday, September 04, 2007

Unemployed

I finally gave up on my career as a minion of an Internet behemoth. To be fair, Yahoo! was in general a pretty classy organization, and they treated me very well. This was a rare instance where a single individual made the workplace so unpleasant that it was difficult to find any aspect of the job that compensated for it. I had many great coworkers, and the location was perfect, but there are only so many days of homicidal rage that you can endure before it starts to wear on you.

Corporate culture is strange in that it's very difficult to fight against a bad boss. I've never been shy about saying "No, i think that's the wrong way to do that". A good boss will probably forgive you for that, and will often agree with you. A bad boss will ignore you, and probably marginalize you since even if your idea is clearly better, it's not part of his or her agenda. Bad bosses need a system where they can take credit for things that go right, and assign blame for things that go wrong; which is antithetical to the environment that breeds innovation and teamwork.