Monday, January 07, 2008
Fun vs. profits in predicting college football results
Tonight's BCS title game between LSU and Ohio State culminated the 2007 bowl season, a 19-day, 32-game college football bonanza that, though unsatisfying to the playoff-wanting populus, is nonetheless enjoyable. It's also a fun testing ground.
Yahoo! Sports offers a bowl pick-em contest that lets users choose the winners of each bowl game, and also lets you assign a confidence level to each game. Different bowl games are worth different amounts of your choosing, from 1 to 32 points. The idea is that the games in which you are most (least) confident you then assign the most (least) points. You must have a selection at every point level, so the final rundown of bowl selections is an ordinal listing of how confident you are in your picks for each and every bowl game.
This let me test the following: Does information with money behind it predict better than information without money behind it?
It worked as follows. Yahoo! provided information about the percentage of the country that selected each participant in each bowl game, so this could be construed as a decent measure of the aggregated non-money information (the game is free to play). The team receiving greater than 50% of the participant vote was selected as the winner (there were no 50%/50% ties), and confidence levels were attained by how much of favorite they were. For example, the highest confidence level was for the Hawaii Bowl-- 95% of those playing liked Boise State to defeat East Carolina. (They didn't.) The lowest confidence game was the Poinsettia Bowl-- 51% of the country liked Navy to defeat Utah. (Nor did they.)
The money information generated a set of picks and confidences from moneyline posts in Vegas. The selection came from the moneyline favorite, the confidence from the size of the line. It is the role of the bookie to place lines such that equal money falls on both sides-- that way, any bowl outcome yields a profit for the house. All votes are equal for non-money information; the same is not true for the money information.
Non-money: 21 of 32 correct, for 382 points
Money: 24 of 32 correct, for 381 points
(Both were around the 90th percentile of all picks submitted.)
I read this as saying the money information-- derived from moneylines-- did best in determining the higher percentage of winners, but the non-money information did a bit better in isolating with confidence those teams that would win. A thin line, nonetheless.
Perhaps more interestingly, there were some pretty sizable differences between the two sets of picks; Utah/Navy, for example, was given the least confidence on the non-money picks, the money picks put that game at 26 confidence points (out of 32)-- and picked the game correctly as well. Tennessee/Wisconsin was at 16 confidence points for the non-money and 4 for the money, though both correctly picked Tennessee as winner. The disparities go on and on. As the numbers bear out above, many of the picks with regards to team are similar, but confidence levels vary widely.
Some shortcomings: Picks were locked after the first bowl game was played, so for the championship game, there was 19 days worth of late information unincorporated into the rankings/choices. This probably would have a larger impact on the money information, if the "late-money-is-smart-money" adage is true, and moneylines could be adjusted (and therefore confidence levels) by small information changes (i.e., player X is now doubtful for bowl Y) whereas picking a different team to win might take a larger information shock (which is how the non-money information confidence levels would be affected).
Also, can the non-money information be trusted as a true reflection of public sentiment, i.e., are people playing the game seriously and not making ad hoc selections? I'm not certain this is a large concern. While it is impossible to believe that everyone who played the game played it seriously, there isn't a reason to believe this caused a bias in any direction. After all, people make bets on a whim as well.
Ah, the sheer bulk of gambling and sports data.