Saturday, October 31, 2009

Got Math? Part Two: The Consequences

In an earlier post, we explored the remarkable similarities between our RWFL rankings posted on Kenneth Massey's comparisons page, using our selected p=0.75 bias value, and Eugene Potemkin's E-Ratings. Through completely independent rationalizations, we ended up at equivalent linear algebra problems that we each solved to reach our rankings.

Well, this didn't seem to be adding much value to the comparisons, so Kenneth nicely asked me if we would do something to make sure ours were unique. So we're going to tweak our algorithm used to bring you weekly rankings, though we're going to do so in a logically consistent way. From now on, we're going to bring you the RWFL rankings as obtained by running the algorithm on the full set of 716 connected college football teams that include the FBS (that is, including all the FCS and DivII schools that play against the FBS, and all the schools who play them, etc.), and we'll report the ordered results from the FBS. This isn't actually "new" per se for us, as these are the rankings we've been using for our bowl predictions the past two years, because we think in principle they should be better. We just didn't want to spring a change without a compelling reason; needing to do something distinct from the E-Ratings is certainly a good enough reason.

If you decide you liked the old RWFL run on the FBS plus a single made-up catch-all non-FBS team, don't worry: you can still see those as the E-Ratings in Massey's comparisons. Indeed, comparing and contrasting the two should be interesting, in that the difference is all because of the treatment of the non-FBS teams, emphasizing the follow-on indirect effects present in the rankings.

An interesting part about this switch has to do with the only other change we've ever made in our rankings. Back in the original days of the Random Walker rankings, when all of us involved were all still at Georgia Tech, our "RW" rankings were just the linear algebra problem described in our manuscripts (which you can reach quickly from the sidebar), describing walkers with first-place votes. As noted at the end of our American Mathematical Monthly paper, there were a lot of reasons to expect improvement using this along with a second set of walkers, with last-place votes. For years, we've simply subtracted these second vote counts from the first to give the RWFL rankings ("Random Walkers First-Last").

But on the whole connected network of 716 teams, very little total weight of those last-place votes ends up in the FBS at all, so the rankings of the FBS teams are only very slightly modified by the last-place piece. One might argue that it would be more interesting to look at ratios instead of differences between the first-place and last-place votes, but that's not something we're going to do without some mathematical and computational investigation first.

Without further ado, as a means of comparison, let's back up to the beginning of the week (not just so we can relive the Carolina victory over Virginia Tech). The rankings listed below with the full connected set of teams definitely differs in some places from the old, simpler setting.

2009 Random Walker Rankings (RWFL, p=0.75)
Games through Saturday October 24th:
1. Iowa (8-0) [1.5817]
2. Florida (7-0) [1.5259]
3. Alabama (8-0) [1.5150]
4. Boise St (7-0) [1.1477]
5. TCU (7-0) [1.1042]
6. Southern Cal (6-1) [1.0839]
7. Oregon (6-1) [1.0780]
8. Texas (7-0) [1.0615]
9. LSU (6-1) [1.0351]
10. Georgia Tech (7-1) [1.0196]
11. Cincinnati (7-0) [0.9606]
12. Virginia Tech (5-2) [0.8658]
13. Arizona (5-2) [0.8354]
14. Penn State (7-1) [0.7979]
15. Miami FL (5-2) [0.7636]
16. Notre Dame (5-2) [0.7396]
17. South Carolina (6-2) [0.7216]
18. Pittsburgh (7-1) [0.7082]
19. Houston (6-1) [0.7080]
20. Oklahoma St (6-1) [0.6849]
21. Ohio State (6-2) [0.6686]
22. West Virginia (6-1) [0.6671]
23. Wisconsin (5-2) [0.6599]
24. California (5-2) [0.6102]
25. Utah (6-1) [0.5987]
26. Clemson (4-3) [0.5983]
27. Washington (3-5) [0.5971]
28. Georgia (4-3) [0.5816]
29. Kentucky (4-3) [0.5785]
30. Central Michigan (7-1) [0.5483]
31. Stanford (5-3) [0.5317]
32. Auburn (5-3) [0.5246]
33. Mississippi (5-2) [0.5227]
34. Michigan (5-3) [0.5090]
35. Oregon St (4-3) [0.5085]
36. Brigham Young (6-2) [0.4973]
37. Arkansas (3-4) [0.4728]
38. Kansas (5-2) [0.4617]
39. Navy (6-2) [0.4476]
40. Tennessee (3-4) [0.4473]
41. Boston College (5-3) [0.4472]
42. Troy (5-2) [0.4423]
43. Idaho (6-2) [0.4359]
44. Michigan St (4-4) [0.4309]
45. South Florida (5-2) [0.4255]
46. Oklahoma (4-3) [0.4157]
47. UCLA (3-4) [0.4119]
48. Arizona St (4-3) [0.4097]
49. Fresno St (4-3) [0.4068]
50. Iowa St (5-3) [0.3971]
51. Minnesota (4-4) [0.3863]
52. Nebraska (4-3) [0.3834]
53. Florida St (3-4) [0.3770]
54. Texas Tech (5-3) [0.3711]
55. Kansas St (5-3) [0.3708]
56. Marshall (5-3) [0.3688]
57. Missouri (4-3) [0.3631]
58. Virginia (3-4) [0.3422]
59. Wake Forest (4-4) [0.3391]
60. Rutgers (5-2) [0.3358]
61. Temple (5-2) [0.3357]
62. Mississippi St (3-5) [0.3356]
63. UTEP (3-4) [0.3356]
64. Nevada (4-3) [0.3286]
65. North Carolina (4-3) [0.3278]
66. Connecticut (4-3) [0.3232]
67. Purdue (3-5) [0.3231]
68. Louisiana-Monroe (4-3) [0.3146]
69. Duke (4-3) [0.3062]
70. SMU (3-4) [0.2991]
71. Texas A&M (4-3) [0.2976]
72. East Carolina (4-3) [0.2973]
73. North Carolina St (3-4) [0.2907]
74. Louisiana-Lafayette (4-3) [0.2842]
75. Colorado St (3-5) [0.2837]
76. Northern Illinois (4-3) [0.2814]
77. Southern Miss (5-3) [0.2748]
78. Wyoming (4-3) [0.2716]
79. Ohio U. (5-3) [0.2709]
80. Colorado (2-5) [0.2703]
81. Northwestern (5-3) [0.2684]
82. Air Force (4-4) [0.2677]
83. Bowling Green (3-5) [0.2592]
84. Syracuse (3-4) [0.2571]
85. Middle Tennessee St (4-3) [0.2457]
86. Toledo (4-4) [0.2452]
87. Western Michigan (4-4) [0.2379]
88. Indiana (4-4) [0.2353]
89. Tulsa (4-3) [0.2344]
90. Washington St (1-6) [0.2296]
91. Baylor (3-4) [0.2288]
92. Louisville (2-5) [0.2238]
93. Central Florida (4-3) [0.2203]
94. San Jose St (1-5) [0.2175]
95. San Diego St (3-4) [0.2165]
96. Arkansas St (2-4) [0.2080]
97. Buffalo (3-5) [0.2024]
98. Florida Atlantic (2-4) [0.2010]
99. Maryland (2-6) [0.1958]
100. Hawai`i (2-5) [0.1819]
101. Kent St (4-4) [0.1814]
102. UNLV (3-5) [0.1725]
103. Louisiana Tech (3-4) [0.1615]
104. Alabama-Birmingham (2-5) [0.1571]
105. Tulane (2-5) [0.1545]
106. Vanderbilt (2-6) [0.1516]
107. Utah St (2-5) [0.1453]
108. New Mexico St (3-5) [0.1423]
109. Memphis (2-5) [0.1413]
110. Illinois (1-6) [0.1366]
111. North Texas (1-6) [0.1264]
112. Florida Int'l (1-6) [0.1256]
113. Army (3-5) [0.1185]
114. Miami OH (0-8) [0.1078]
115. Akron (1-6) [0.1038]
116. Ball St (1-7) [0.0379]
117. Rice (0-8) [0.0368]
118. New Mexico (0-7) [0.0185]
119. Western Kentucky (0-7) [0.0139]
120. Eastern Michigan (0-7) [0.0083]
Conference Rankings (Average Per Team):
SEC 0.7010
Pac10 0.6296
Big10 0.5452
ACC 0.4894
BigEast 0.4877
Big12 0.4422
FBSInd 0.4353
MWC 0.3812
WAC 0.3519
CUSA 0.2690
SunBelt 0.2180
MAC 0.2169
Non-FBS -0.0867

Labels: ,

Saturday, October 24, 2009

Got Math?

On a day full of exciting action, including a last-second blocked FG attempt that may turn out to have serious BCS implications, it may seem rather pedestrian to ask a math question. Then again, that's essentially what we do here. So while we're watching the rest of the games, I have a question, brought to my attention by another football ranking fan, Martien Maas.

Martien Maas' Rating System also appears on Kenneth Massey's College Football Ranking Comparison page. Perhaps in part because we ended up very close to each other in the comparisons this week, Martien noted that the RWFL rank order this week is precisely the same as that from Eugene Potemkin's E-Rating System (see also his more detailed discussion). Indeed, the two are nearly the same every week (except for some examples from last year, including here, here, and here). And there are clearly some philosophical similarities between the two rankings. But I haven't sat down to try to work out whether we're mathematically equivalent, so I'd be happy if someone could tell me if they have an expert opinion here. My gut instinct is that our p=0.75 bias value choice happens to set our rankings to the same linear algebra problem, with perhaps the small differences in the past due to details about how non-FBS teams are handled. But, like I said, I haven't looked at it sufficiently yet. Nevertheless, I thought it was worth mentioning...

----

Addition (October 25): Of course, while I tried to leave this puzzle for others, I couldn't let it go myself. I can never resist a good puzzle. It's probably a good thing that I get to solve puzzles for a living. Plus I received an email from Eugene Potemkin responding to a query I sent him directly.

Eugene and I had a wonderfully pleasant exchange of emails back and forth today, wherein he shared some of the details of his E-Rating implementation for college football, adding further mathematical details, including: (1) Where he uses ratios of "ratings" and "anti-ratings" to obtain scores in other sports, he uses a difference for American college football (this is the same as the "First-minus-Last" part in RWFL). (2) Like us, he usually treats the collection of all non-FBS teams as effectively one team. (3) To get around the singular nature of random walks on the fully directed graph---sorry for the lingo here but be thankful I'm not using it to launch into an entire discussion of how this relates to the original PageRank algorithm!---he doesn't treat a win as a full win; rather he equates a win as effectively 3 wins and 1 loss. This is perfectly identical to the "bias value" p=0.75 choice that we've espoused here, which is nice for a variety of reasons. So it appears that the minor differences must be small round-off or tie-breaking differences, and the RWFL(p=0.75) and E-Ratings are completely identical.

Again, a huge thanks to both Martien and Eugene. It's been nice emailing with both of them.

Going forward, we still have value to add, don't worry. For instance, we should spend a lot more time in future posts looking at the plots I post every week that show the top rankings across different choices of this infamous "bias value" p, because those plots hold a lot of utility in being a proxy for various kinds of ranking choices.

Labels: