Tuesday, September 16, 2008

Netflix Prize

I'm not sure if we've covered this before, if we did it was a while ago, but the Netflix Prize has been a popular conversation piece recently. The concept is simple-- write an algorithm that predicts whether users will like or dislike movies, based on previous likes and dislike, at a clip 10% better than Netflix's own algorithm (Cinematch). The prize? $1,000,000. There are also Progress Prizes of $50,000 along the way to keep the carrot within reach.

My interpretation of the leaderboard says that teams are breaking the 9% clip, and what is interesting is the timeframe of these entries-- the #2 entry was submitted at 10:00pm last night. The best programs are nearing the Progress Prize, so I'd imagine a flurry of activity until they get there.

I don't have a sense for the progress of this...could it take another year? 18 months? 6 months? I do know that amongst the programming community, this is a prize worth seeking-- and the $1 million may not be worth the prestige from being the person who wrote the golden code.

Obviously, Netflix now has swarms of very talented coders working their fingers to the bone to develop better software. They would have spent well into the millions of dollars to get this program up and running, so it's a bargain to them. Netflix also strikes me as one of the few internet companies that doesn't have a cash flow problem in the least.

Also, without knowing one thing about the specifics of the code (Tom, what do you know about this?), my guess is that Cinematch is a (relatively) simple code that doesn't presume too much and does a pretty solid job-- thus, making something exceedingly complex that could outperform it wouldn't be terribly difficult, but to outperform it by 10% would be pretty challenging.

1 comment:

Thomas said...

At first glance it's not clear to me how simple/complex Cinematch is. The say it's a bunch of linear estimators combined with a lot of data massaging...but that data massaging is sometimes half the battle. They also highlight that performance requirements are key in a production system (although contest entries aren't being judged on performance characteristics), so clever speed tricks would also impact the complexity of the overall algorithm.