Nothing gets me fired up like hearing the CBS March Madness intro. It’s been a mainstay on my GATA (workout) playlist for years, and whenever it comes on I think back to the Laettner’s and George Mason’s of the world while trying to forget about how many times Vanderbilt has broken my heart. The seemingly unpredictable nature of the tournament is its biggest asset, but it also got me thinking. Is there any way to better predict who is going to win in the tournament at each stage? Having never won any tournament pool in my life, I have a financial motivation to get to the bottom of this as well. Using the handy tool of regression analysis and a weird fondness of drinking coffee, listening to Pandora and plugging numbers into spreadsheets, I set about to answer that very question.
Data Collection
Dependent Variable: Since we are trying to predict tournament wins, I went back 5 tournaments (to the 2006-07 season) and for all 64 teams (leaving out the recent play-in games) gathered data for how many games each team won in each tournament. Some quick math will tell you there are 320 observations in total.
Independent Variables: As badly as I wanted to discover some overlooked statistical category right off the bat, starting with some basic stat lines first seemed like the best way to peel back the onion. As a disciple of Dave Berri and Dean Oliver, I used offensive and defensive efficiency as a starting point. To keep things simple, I threw in RPI to level out the playing field.
Sources:
1) Teamrankings.com for offensive efficiency, defensive efficiency and RPI. For these stats I went back and made sure I captured them immediately before the tournament began.
2) Wikipedia for tournament wins each year. Now that I routinely see my professors citing Wikipedia for class slides, I think we’ve removed the final hurdle to Wikipedia legitimacy.
Results/Insights:
Variable |
Coefficient |
Std Error |
P-value |
Constant |
2.090886 |
-3.58401 |
0.0004 |
OFF_EFF |
4.949186 |
1.733313 |
0.0046 |
DEF_EFF |
-5.416243 |
1.764366 |
0.0023 |
RPI |
14.09208 |
2.126283 |
0.0000 |
Above are the results of the regression. As you can see, all variables are significant predictors of tournament wins. Together they account for 36% of the variation in tournament wins (adjusted R-squared).
To put these numbers in perspective, the range of offensive efficiencies for these 5 tournaments is [1.17,.909]. This is the difference between the 2007 Florida national champs and the #224 RPI-ranked Mississippi Valley State Delta Devils of 2008. With a 5 year average offensive efficiency of 1.065, the difference between tournament average and greatness is only .106. Thus, using the coefficients from the above regression, a marginal increase by this amount results in roughly .5 additional tournament victories. An analysis of defensive efficiency yields similar results.
The range in RPI over these 5 tournaments is [.688,.463]. This is the difference between the 2010 Kansas Jayhawks (who lost in the second round) and, once again, the 2008 Mississippi Valley State Delta Devils. With a 5 year average of .592, moving from an average RPI to the best results in an estimated 1.4 additional tournament wins. More on RPI later.
Underdog/Overseeded profiles
Regression results aside, what we really want to do is pick March Madness teams. Before we do that, let’s take a look at what the analysis tells us about future Cinderellas and upset-prone teams. The table below can help shed some light on this.
|
Observations |
Avg Off_Eff |
Avg Def_Eff |
Avg RPI |
First Round Upsets |
|
|
|
|
10-16 seed winners (underdogs who won) |
30 |
1.055 |
0.944 |
0.578 |
1-7 seed losers (favorites who lost) |
30 |
1.065 |
0.949 |
0.608 |
|
|
|
|
|
Second Round Upsets |
|
|
|
|
6-16 seed winners (underdogs who won) |
18 |
1.075 |
0.951 |
0.592 |
1-3 seed losers (favorites who lost) |
13 |
1.095 |
0.933 |
0.636 |
|
|
|
|
|
Overall Avg |
320 |
1.065 |
0.947 |
0.592 |
Standard Deviation |
|
0.046 |
0.041 |
0.041 |
Best |
|
1.17 |
0.816 |
0.688 |
Worst |
|
0.909 |
1.068 |
0.463 |
Range |
|
0.261 |
0.252 |
0.225 |
For First Round Upsets, I think it’s easier to look at favorites who lost – that is teams seeded 1-7 that lost in the first round. Their offensive and defensive efficiencies are around the Overall Average.
For Second Round Upsets, looking at the profile of the underdogs sheds some interesting light. Teams seeded 6-16 that advanced to the Sweet 16 had offensive efficiencies well above average while only slightly below average defensive efficiencies.
Based on this segmentation, it might be beneficial to profile each first and second round matchup as described above in order to wisely predict upsets.
Offense wins Championships
It appears as though offense is rewarded more in the tournament. There are a number of observations that support this.
Most importantly, in four of the five years under observation, the tournament winner has had a higher rated offense than defense. Between 2007-11, the winners had the number 1, 1, 2, 6 and 34 rated offenses in that tournament and the 9, 3, 28, 5 and 40 rated defenses respectively. In other words, teams with a relative strength in offense won the tournament in 4 of the last 5 years. The only team that did not was a well-balanced Duke in 2010, who had the #6 offense and #5 defense in that tournament.
Second, using a single regression model, offensive officiency has a higher R^2 value than defensive efficiency (15% versus 10%).
Finally, as noted above, it seems that if you want to be a good Cinderella story, offensive efficiency is more important than defensive efficiency.
The 2012 Bracket
Now the fun begins. The results of my regression to predict tournament wins for the 2012 field are summarized below.
Coefficients |
|
|
|
|
|
C |
-7.49374487 |
|
|
|
|
|
Off_Eff |
4.949185836 |
|
|
|
|
|
Def_Eff |
-5.416243101 |
|
|
|
|
|
RPI% |
14.09207946 |
|
|
|
|
|
|
|
|
|
|
|
|
Year |
Team |
Seed |
Off_Eff |
Def_Eff |
RPI% |
Pred Wins |
2012 |
Kentucky |
1 |
1.135 |
0.873 |
0.665 |
2.766433671 |
2012 |
Syracuse |
1 |
1.103 |
0.894 |
0.667 |
2.522502778 |
2012 |
North Carolina |
1 |
1.1 |
0.898 |
0.658 |
2.359161533 |
2012 |
Michigan State |
1 |
1.079 |
0.884 |
0.652 |
2.246503557 |
2012 |
Ohio State |
2 |
1.104 |
0.872 |
0.638 |
2.237939008 |
2012 |
Kansas |
2 |
1.09 |
0.9 |
0.642 |
2.073363917 |
2012 |
Missouri |
2 |
1.184 |
0.969 |
0.629 |
1.981669579 |
2012 |
Duke |
2 |
1.109 |
0.979 |
0.651 |
1.866343958 |
2012 |
Wichita State |
5 |
1.112 |
0.907 |
0.622 |
1.862490714 |
2012 |
Memphis |
8 |
1.081 |
0.906 |
0.62 |
1.686298038 |
2012 |
Marquette |
3 |
1.056 |
0.921 |
0.633 |
1.664521778 |
2012 |
Baylor |
3 |
1.08 |
0.949 |
0.634 |
1.645739511 |
2012 |
Indiana |
4 |
1.131 |
0.96 |
0.618 |
1.613096043 |
2012 |
Wisconsin |
4 |
1.055 |
0.873 |
0.61 |
1.595434434 |
2012 |
New Mexico |
5 |
1.067 |
0.877 |
0.607 |
1.590883453 |
2012 |
Murray State |
6 |
1.087 |
0.913 |
0.612 |
1.565342815 |
2012 |
Georgetown |
3 |
1.04 |
0.898 |
0.62 |
1.526711363 |
2012 |
Gonzaga |
7 |
1.079 |
0.922 |
0.609 |
1.434726902 |
2012 |
Florida State |
3 |
1 |
0.895 |
0.626 |
1.429545136 |
2012 |
Saint Louis |
9 |
1.061 |
0.881 |
0.599 |
1.42678673 |
2012 |
Saint Mary’s |
7 |
1.124 |
0.957 |
0.606 |
1.425595518 |
2012 |
UNLV |
6 |
1.061 |
0.927 |
0.616 |
1.417204898 |
2012 |
Louisville |
4 |
0.988 |
0.878 |
0.621 |
1.391770641 |
2012 |
Creighton |
8 |
1.159 |
1.007 |
0.608 |
1.356189026 |
2012 |
Vanderbilt |
5 |
1.077 |
0.959 |
0.616 |
1.323072092 |
2012 |
Florida |
7 |
1.127 |
0.976 |
0.603 |
1.295258218 |
2012 |
Harvard |
12 |
1.046 |
0.885 |
0.59 |
1.204055254 |
2012 |
Michigan |
4 |
1.062 |
0.991 |
0.621 |
1.145974923 |
2012 |
Temple |
5 |
1.079 |
0.992 |
0.615 |
1.140142362 |
2012 |
Belmont |
14 |
1.155 |
0.958 |
0.572 |
1.094473334 |
2012 |
California |
12 |
1.062 |
0.916 |
0.586 |
1.058970374 |
2012 |
San Diego State |
6 |
1.023 |
0.938 |
0.607 |
1.042728447 |
2012 |
Long Beach State |
12 |
1.071 |
0.938 |
0.589 |
1.026631937 |
2012 |
S Dakota St |
14 |
1.125 |
0.981 |
0.585 |
1.004621201 |
2012 |
Southern Miss |
9 |
1.052 |
0.985 |
0.611 |
0.988059728 |
2012 |
Virginia |
10 |
1.01 |
0.86 |
0.577 |
0.978093609 |
2012 |
VCU |
12 |
1.024 |
0.895 |
0.584 |
0.956458258 |
2012 |
BYU |
14 |
1.056 |
0.911 |
0.578 |
0.943619839 |
2012 |
Iona |
14 |
1.14 |
0.994 |
0.579 |
0.923895351 |
2012 |
Alabama |
9 |
0.999 |
0.894 |
0.586 |
0.866329014 |
2012 |
Iowa State |
8 |
1.069 |
0.974 |
0.592 |
0.864025052 |
2012 |
Kansas State |
8 |
1.026 |
0.915 |
0.579 |
0.787571371 |
2012 |
Cincinnatti |
6 |
1.031 |
0.921 |
0.579 |
0.779819841 |
2012 |
Connecticut |
9 |
1.036 |
0.965 |
0.594 |
0.777632266 |
2012 |
New Mexico State |
13 |
1.072 |
0.944 |
0.573 |
0.773610392 |
2012 |
Ohio |
13 |
1.024 |
0.914 |
0.578 |
0.768997163 |
2012 |
Davidson |
13 |
1.094 |
0.96 |
0.57 |
0.753556353 |
2012 |
Notre Dame |
7 |
1.04 |
0.963 |
0.586 |
0.69552486 |
2012 |
Purdue |
10 |
1.091 |
0.998 |
0.579 |
0.659720273 |
2012 |
Colorado State |
11 |
1.07 |
1.03 |
0.598 |
0.650217101 |
2012 |
Montana |
13 |
1.049 |
0.916 |
0.561 |
0.642328971 |
2012 |
Texas |
11 |
1.061 |
0.97 |
0.577 |
0.634715345 |
2012 |
North Carolina St |
11 |
1.058 |
0.981 |
0.577 |
0.560289114 |
2012 |
Lehigh |
15 |
1.071 |
0.926 |
0.547 |
0.499759516 |
2012 |
Xavier |
10 |
1.008 |
0.96 |
0.582 |
0.497031324 |
2012 |
West Virginia |
10 |
1.047 |
0.967 |
0.57 |
0.483030917 |
2012 |
Saint Bonaventure |
14 |
1.054 |
0.969 |
0.56 |
0.365921937 |
2012 |
South Florida |
12 |
0.961 |
0.937 |
0.578 |
0.332624864 |
2012 |
Colorado |
11 |
0.994 |
0.953 |
0.572 |
0.32473563 |
2012 |
Loyola Maryland |
15 |
1.016 |
0.959 |
0.557 |
0.189739068 |
2012 |
LIU Brooklyn |
16 |
1.067 |
1.008 |
0.557 |
0.176751633 |
2012 |
Lamar |
16 |
1.033 |
0.937 |
0.536 |
0.097098906 |
2012 |
UNC Asheville |
16 |
1.099 |
1.012 |
0.541 |
0.087987336 |
2012 |
Vermont |
16 |
1.034 |
0.931 |
0.516 |
-0.14729604 |
2012 |
Detroit |
15 |
1.037 |
0.994 |
0.524 |
-0.36093516 |
2012 |
Norfolk State |
15 |
0.983 |
0.943 |
0.522 |
-0.38014696 |
2012 |
Mississippi Valley St |
16 |
0.963 |
0.968 |
0.513 |
-0.74136547 |
2012 |
Western Kentucky |
16 |
0.93 |
0.972 |
0.487 |
-1.29274764 |
First observation – lots of chalk. The top 4 teams are the one seeds and the next 4 are the two seeds. So, filling out a bracket based strictly on my predicted wins will yield no surprises in the Elite 8. In fact, if I only use the model’s predicted wins to determine who wins each game, I will have only 3 upsets all tournament.
If we loosen the rules a little to account for some of the upset-prone and Cinderella profiles described above, we see the following.
First Round Upset Prone Teams:
Notre Dame
Marquette
UNLV
Michigan
San Diego State
Cinderella Stories (potential to advance to the sweet 16):
Wichita State
Memphis
Murray State
Gonzaga
Saint Mary’s
Belmont
Long Beach State
New Mexico State
What I would advise is look at the first and second round match ups involving the above teams to get a better sense of the match up, and then pick your upsets wisely. For the record, my Final Four is Kentucky, Missouri, Ohio State and UNC. As much as I dislike him as a person, I have Calipari’s Wildcats cutting down the nets in New Orleans on April 2.
A Brief Note about RPI
The RPI has come under a lot of criticism recently, and some of the data seem to support this assertion. First, RPI and Seed are highly correlated. In fact, RPI accounts for 81% of the variation in tournament seedings. The data certainly seems to indicate the selection committee takes RPI into account to a large extent when seeding the bracket. However, when we include Seed in our regression model to predict tournament wins, the adjusted R^2 of the model only increases to .41 (from .36 before). Interestingly, the highest-ranked RPI team heading into the tournament over the last five years was the 2010 Kansas squad who lost in the second round as a 1 seed.
I have included a table at the end of this post that shows predicted wins from a regression EXCLUDING RPI.
Next Steps
My March Madness analysis is just beginning. I plan to peel back the onion further to understand the critical components of offensive and defensive efficiency that help determine tournament wins. I also plan to investigate alternative rankings besides RPI that level the playing field.
But, before I do all that I’m headed to the Casino tomorrow to do some of my favorite things: watch the opening of March Madness, gamble and drink free beer.
-DaveCaughman
P.S. In case you’re interested, below is the table showing the results from a regression that EXCLUDED RPI
Coefficients |
|
|
|
|
|
C |
-1.094 |
|
|
|
|
|
Off_Eff |
12.505 |
|
|
|
|
|
Def_Eff |
-11.869 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Year |
Team |
Seed |
Off_Eff |
Def_Eff |
RPI% |
Pred Wins |
2012 |
Kentucky |
1 |
1.135 |
0.873 |
0.665 |
2.737538 |
2012 |
Ohio State |
2 |
1.104 |
0.872 |
0.638 |
2.361752 |
2012 |
Missouri |
2 |
1.184 |
0.969 |
0.629 |
2.210859 |
2012 |
Syracuse |
1 |
1.103 |
0.894 |
0.667 |
2.088129 |
2012 |
Wichita State |
5 |
1.112 |
0.907 |
0.622 |
2.046377 |
2012 |
North Carolina |
1 |
1.1 |
0.898 |
0.658 |
2.003138 |
2012 |
Belmont |
14 |
1.155 |
0.958 |
0.572 |
1.978773 |
2012 |
Michigan State |
1 |
1.079 |
0.884 |
0.652 |
1.906699 |
2012 |
Kansas |
2 |
1.09 |
0.9 |
0.642 |
1.85435 |
2012 |
New Mexico |
5 |
1.067 |
0.877 |
0.607 |
1.839722 |
2012 |
Wisconsin |
4 |
1.055 |
0.873 |
0.61 |
1.737138 |
2012 |
Saint Louis |
9 |
1.061 |
0.881 |
0.599 |
1.717216 |
2012 |
Memphis |
8 |
1.081 |
0.906 |
0.62 |
1.670591 |
2012 |
Murray State |
6 |
1.087 |
0.913 |
0.612 |
1.662538 |
2012 |
Indiana |
4 |
1.131 |
0.96 |
0.618 |
1.654915 |
2012 |
Saint Mary’s |
7 |
1.124 |
0.957 |
0.606 |
1.602987 |
2012 |
Harvard |
12 |
1.046 |
0.885 |
0.59 |
1.482165 |
2012 |
Gonzaga |
7 |
1.079 |
0.922 |
0.609 |
1.455677 |
2012 |
Creighton |
8 |
1.159 |
1.007 |
0.608 |
1.447212 |
2012 |
Florida |
7 |
1.127 |
0.976 |
0.603 |
1.414991 |
2012 |
Iona |
14 |
1.14 |
0.994 |
0.579 |
1.363914 |
2012 |
S Dakota St |
14 |
1.125 |
0.981 |
0.585 |
1.330636 |
2012 |
Virginia |
10 |
1.01 |
0.86 |
0.577 |
1.32871 |
2012 |
California |
12 |
1.062 |
0.916 |
0.586 |
1.314306 |
2012 |
Lehigh |
15 |
1.071 |
0.926 |
0.547 |
1.308161 |
2012 |
BYU |
14 |
1.056 |
0.911 |
0.578 |
1.298621 |
2012 |
Georgetown |
3 |
1.04 |
0.898 |
0.62 |
1.252838 |
2012 |
Davidson |
13 |
1.094 |
0.96 |
0.57 |
1.19223 |
2012 |
Marquette |
3 |
1.056 |
0.921 |
0.633 |
1.179931 |
2012 |
UNLV |
6 |
1.061 |
0.927 |
0.616 |
1.171242 |
2012 |
Long Beach State |
12 |
1.071 |
0.938 |
0.589 |
1.165733 |
2012 |
Duke |
2 |
1.109 |
0.979 |
0.651 |
1.154294 |
2012 |
Montana |
13 |
1.049 |
0.916 |
0.561 |
1.151741 |
2012 |
Baylor |
3 |
1.08 |
0.949 |
0.634 |
1.147719 |
2012 |
New Mexico State |
13 |
1.072 |
0.944 |
0.573 |
1.107024 |
2012 |
VCU |
12 |
1.024 |
0.895 |
0.584 |
1.088365 |
2012 |
Vanderbilt |
5 |
1.077 |
0.959 |
0.616 |
0.991514 |
2012 |
Kansas State |
8 |
1.026 |
0.915 |
0.579 |
0.875995 |
2012 |
Cincinnatti |
6 |
1.031 |
0.921 |
0.579 |
0.867306 |
2012 |
Ohio |
13 |
1.024 |
0.914 |
0.578 |
0.862854 |
2012 |
Louisville |
4 |
0.988 |
0.878 |
0.621 |
0.839958 |
2012 |
Florida State |
3 |
1 |
0.895 |
0.626 |
0.788245 |
2012 |
Alabama |
9 |
0.999 |
0.894 |
0.586 |
0.787609 |
2012 |
Vermont |
16 |
1.034 |
0.931 |
0.516 |
0.786131 |
2012 |
Iowa State |
8 |
1.069 |
0.974 |
0.592 |
0.713439 |
2012 |
Purdue |
10 |
1.091 |
0.998 |
0.579 |
0.703693 |
2012 |
Lamar |
16 |
1.033 |
0.937 |
0.536 |
0.702412 |
2012 |
Texas |
11 |
1.061 |
0.97 |
0.577 |
0.660875 |
2012 |
UNC Asheville |
16 |
1.099 |
1.012 |
0.541 |
0.637567 |
2012 |
Temple |
5 |
1.079 |
0.992 |
0.615 |
0.624847 |
2012 |
Saint Bonaventure |
14 |
1.054 |
0.969 |
0.56 |
0.585209 |
2012 |
San Diego State |
6 |
1.023 |
0.938 |
0.607 |
0.565493 |
2012 |
West Virginia |
10 |
1.047 |
0.967 |
0.57 |
0.521412 |
2012 |
North Carolina St |
11 |
1.058 |
0.981 |
0.577 |
0.492801 |
2012 |
Notre Dame |
7 |
1.04 |
0.963 |
0.586 |
0.481353 |
2012 |
Michigan |
4 |
1.062 |
0.991 |
0.621 |
0.424131 |
2012 |
Connecticut |
9 |
1.036 |
0.965 |
0.594 |
0.407595 |
2012 |
Southern Miss |
9 |
1.052 |
0.985 |
0.611 |
0.370295 |
2012 |
LIU Brooklyn |
16 |
1.067 |
1.008 |
0.557 |
0.284883 |
2012 |
Loyola Maryland |
15 |
1.016 |
0.959 |
0.557 |
0.228709 |
2012 |
Xavier |
10 |
1.008 |
0.96 |
0.582 |
0.1168 |
2012 |
Detroit |
15 |
1.037 |
0.994 |
0.524 |
0.075899 |
2012 |
Colorado State |
11 |
1.07 |
1.03 |
0.598 |
0.06128 |
2012 |
Colorado |
11 |
0.994 |
0.953 |
0.572 |
0.024813 |
2012 |
Norfolk State |
15 |
0.983 |
0.943 |
0.522 |
0.005948 |
2012 |
South Florida |
12 |
0.961 |
0.937 |
0.578 |
-0.197948 |
2012 |
Mississippi Valley St |
16 |
0.963 |
0.968 |
0.513 |
-0.540877 |
2012 |
Western Kentucky |
16 |
0.93 |
0.972 |
0.487 |
-1.001018 |
Tags: college basketball, march madness, ncaa tournament, regression