The Optimal NBA Roster Updated

4 Sep


The Optimal Roster

JhabvalaUpdatedPost

As shown in the movie Moneyball, the use of data analytics has played an ever growing role in decisions made by professional sports teams.

My concept of the “Optimal Roster” was developed after an Optimization class at Carnegie Mellon’s Tepper School of Business.  In a particular class discussion, my professor attempted to maximize the return of a bond portfolio by choosing an ideal group of five bonds, out of a possible ten, given a specified dollar amount to invest.   In order to do this, he used Excel Risk Solver to help identify the optimal solution/portfolio of bonds to choose.

This discussion led me to the optimal roster concept.  What if I replaced bonds with NBA players?  What if instead of an investment portfolio, I optimized the selection of players within the constraints of a team’s Salary Cap?   What if instead of considering bond coupons (interest), maturity and risk, we looked at a player’s points, rebounds and assists?  Lastly instead of a having to choose five bonds out of a possible ten, what if we had to choose 15 players that comprised the ‘Optimal Roster’ taken from a pool of all players currently in the NBA?

My concept of the “Optimal Roster” uses the Excel Risk Solver software to create an optimal NBA roster of 15 players while staying in the confines of the salary cap or potential tax level based on the 2011 CBA agreement for NBA teams.

Using John Hollinger’s GameScore calculation averaged over an 82-game regular season and adjusted to per 48 minutes played,, I was able to rank players based on position.  Then adding in their Salary information, I was able to set the constraints in Risk Solver stating the total money spent on the 15 players cannot exceed a given limit, in this case I used a ceiling that was a calculation of the average NBA team salary in the 2012-13 season.   An additional constraint was added such that only one player from each position may be chosen to build the starting lineup and each roster must have two back up players at each position.   Given each team may carry 15 players, but can only have 12 active for a game, it made sense to build the model to accommodate all 15.

The main focus of this research is to be able to understand small variances in spending and to place a framework around how a team can go about statistically maximizes roster to meet given expectations.   In other words, if half of a percent more of the salary cap was dedicated to spending on the starting five players, how much greater does total team PER become.   Does it make sense to potentially over spend and dip into the luxury tax just to gain more productivity from players?   Or can you find ways to save money while still getting productive output from your lineup.  Lastly, if a team were to lose a few players in the offseason due to Free Agency, are we able to take a pool of undrafted collegiate prospects, international NBA eligible players and current NBA free agents and use this model to find the ideal solutions to the vacated roster holes?  These are the ideas that are being explored using this model.

The last question in the preceding paragraph is largely dependent on the ability to standardize collegiate and international players with current NBA players in order to compare apples to apples.  This would require a projection model to attempt to predict the output of college players at the NBA level, and a similar conversion model to convert international player productivity to the NBA game.    In the current optimization model, we are only using a pool of current NBA players to build the ideal roster because the framework for the projection models is not currently in place.   I wanted to take a moment to acknowledge the next steps in the process that I have yet to achieve, but certainly plan to do.

If you have any questions or thoughts on this model and research, please contact me at jjhabval@tepper.cmu.edu.

Jordan Jhabvala 

The Optimal NBA Roster

1 Apr

JhabvalaPost

Following the movie Moneyball, the use of data analytics has played an ever growing role in decisions made by professional sports teams.

My concept of the “Optimal Roster” was developed after an Optimization class at Carnegie Mellon’s Tepper School of Business.  In a particular class discussion, my professor attempted to maximize the return of a bond portfolio by choosing an ideal group of five bonds, out of a possible ten, given a specified dollar amount to invest.   In order to do this, he used Excel Risk Solver to help identify the optimal solution/portfolio of bonds to choose.

This discussion led me to the optimal roster concept.  What if I replaced bonds with NBA players?  What if instead of an investment portfolio, I optimized the selection of players within the constraints of a team’s Salary Cap?   What if instead of considering bond coupons (interest), maturity and risk, we looked at a player’s points, rebounds and assists?  Lastly instead of a having to choose five bonds out of a possible ten, what if we had to choose five players, one player at each position, out all NBA players currently in the NBA?

My concept of the “Optimal Roster” uses the Excel Risk Solver software to create an optimal starting lineup of an NBA team while staying in the confines of currently salary cap for a team.

Using a simplified version of John Hollinger’s PER calculation, I was able to rank players based on position.  Then adding in their Salary information, I was able to set the constraints in Risk Solver stating the total money spent on the five players cannot exceed the percentage of the salary cap inputted.   Also, only one player from each position may be chosen in order to maximize total team PER.

The main focus of this research is to be able to understand small variances in spending.  Meaning, if half of a percent more of the salary cap was dedicated to spending on the starting five players, how much greater does total team PER become.   Does it make sense to potentially over spend and dip into the luxury tax just to gain more productivity from players?   Or can you find ways to save money while still getting productive output from your lineup.  These are the ideas that are being explored using this model.

The ideal scenario in this case is to be able to build a team of 15 players; however the student version of Risk Solver limits us to 200 variables so this isn’t possible.   Also, the next step in this research is to be able to account for rookie salaries versus veteran deals which certainly skew the results.   Nonetheless, the framework from the model is in place and I will continue to post more once I am able to do more of the variance research resulting from having more variables to work with.

If you have any questions or thoughts on this model and research, please contact me at jjhabval@tepper.cmu.edu.

Jordan Jhabvala 

Red Sox Roster Optimization

27 Jan

Now that San Francisco has won the World Series, the offseason has officially started.  Call me crazy, but I enjoy watching my team, the Red Sox, come together in the offseason almost as much as I enjoy watching the season.  This is especially true for me given the long season that the Red sox had.

With the offseason comes speculation.  Who goes where?  Who signs the big contract?  The speculation has become more quantitative in recent years.  However, there is much less conversation around building the best team.  Of the team conversations that do happen, they tend to be very qualitative in nature and come spring training it’s hard to tell if offseason goals have been accomplished.

Since I’ve been learning all sorts of methods on optimization, I thought it would be fun to see if I could use these skills to optimize a baseball team’s offseason.  So, I made an effort to optimize a lineup for the Red Sox based on their current roster and the available free agents.  To start, I have only optimized the offensive side of the roster.  There are a couple reasons for this that I will get into later, but for now I have an optimization model for the lineup and the bench.  The optimization is based on two things:  OPS and salary.  OPS, while not universal or comprehensive, is a common measuring stick to determine a player’s offensive value, and salary is the common limiting factor for every team.  To illustrate the correlation between team OPS and runs I ran a regression based on last year’s team data.  It should come as no surprise that the R^2 of OPS on runs is 97%, and hence a good measure of offensive output.

So, I’ve put together a model that optimized the team’s OPS.  This model adds free agents to the current roster to determine the optimal lineup.  Taking every free agent into account would take too long to do for free, so my model is limited to the free agents listed below:

Name

OPS

2012 Salary

Name

OPS

2012 Salary

Ryan Lavarnway

0.459

$500,000

Maicer Izturis (32)

0.634

$3,966,667

Guillermo Quiroz

0.625

$500,000

Kelly Johnson (31)

0.687

$6,375,000

Jarrod Saltalamacchia

0.742

$2,500,000

Jason Bartlett (33)

0.433

$5,500,000

Pedro Ciriaco

0.705

$500,000

Yuniesky Betancourt (31)

0.656

$2,000,000

Mauro Gomez

0.746

$500,000

Brian Bixler (30)

0.583

$500,000

Jose Iglesias

0.391

$500,000

Ronny Cedeno (30)

0.741

$1,150,000

Dustin Pedroia

0.797

$8,250,000

Marco Scutaro (37)

0.753

$6,000,000

Jacoby Ellsbury

0.682

$8,050,000

Eric Chavez (35)

0.845

$900,000

Ryan Kalish

0.625

$500,000

Brandon Inge (36)

0.658

$5,500,000

Che-Hsuan Lin

0.500

$500,000

Jose Lopez (29)

0.626

$800,000

Daniel Nava

0.742

$500,000

Scott Rolen (38)

0.716

$8,166,667

Will Middlebrooks

0.835

$500,000

Travis Buck (29)

0.595

$580,000

David Ortiz

1.026

$14,575,000

Melky Cabrera (28)

0.906

$6,000,000

Cody Ross

0.807

$3,000,000

Jonny Gomes (32)

0.868

$1,000,000

Ivan DeJesus

0.000

$480,500

Josh Hamilton (32)

0.930

$15,250,000

James Loney

0.630

$6,375,000

Andruw Jones (36)

0.701

$2,000,000

Danny Valencia

0.388

$515,000

Michael Bourn (30)

0.739

$6,845,000

Russell Martin (30)

0.713

$7,500,000

Scott Hairston (33)

0.803

$1,100,000

Mike Napoli (31)

0.812

$9,400,000

B.J. Upton (28)

0.752

$7,000,000

A.J. Pierzynski (36)

0.827

$6,000,000

Shane Victorino (32)

0.704

$9,500,000

Kelly Shoppach (33)

0.798

$1,350,000

Brian Bixler (30)

0.583

$500,000

Eric Hinske (35)

0.583

$1,600,000

Travis Buck (29)

0.595

$580,000

Casey Kotchman (30)

0.612

$3,000,000

Torii Hunter (37)

0.817

$18,500,000

Carlos Lee (37)

0.697

$19,000,000

Nick Swisher (32)

0.837

$10,250,000

Carlos Pena (35)

0.684

$7,250,000

The use of these particular free agents is completely arbitrary; no free agent is left off for any particular reason.   Also, to simplify things for right now, I’m using last year’s OPS and salary for each free agent.  I understand that this is not ideal, but I will add these pieces into the equation at a later point.  To begin, I simply want to test the theory.  So, using this information I get the following offensive roster for the 2013 Red Sox.

Name

Position

OPS

2102 Salary

Mike Napoli (31)

C

0.812

$9,400,000

Carlos Lee (37)

1B

0.697

$19,000,000

Dustin Pedroia*

2B

0.797

$8,250,000

Will Middlebrooks*

3B

0.835

$500,000

Marco Scutaro (37)

SS

0.753

$6,000,000

Melky Cabrera (28)

OF

0.906

$6,000,000

Jonny Gomes (32)

OF

0.868

$1,000,000

Josh Hamilton (32)

OF

0.93

$15,250,000

David Ortiz

DH

1.026

$14,575,000

A.J. Pierzynski (36)

C

0.827

$6,000,000

Ronny Cedeno (30)

SS

0.741

$1,150,000

Eric Chavez (35)

3B

0.845

$900,000

Scott Hairston (33)

OF

0.803

$1,100,000

Total Offensive Salary:

$89,125,000

Average OPS:

0.834

*Under contract, not part of optimization.

There are a couple of things to note with this offensive group.  First, by design, there are no salary constraints in this first run.  This would explain the $6 million backup catcher and the $90 million offensive payroll.  This is simply a first step sanity check to prove that the model will in fact pick out the optimal combination of players.  Also, there’s only two current Red Sox that are on this roster.  This was mostly due to the fact that Pedroia and Middlebrooks are the only two currently signed Red Sox players who you can write into the 2013 plan in ink.  While there are others under contract, there’s at least some speculation around everyone else.   Plus, the model is more interesting with more moving parts.  I’ll replace interesting with accurate as I go through the steps of refining the model, but for a first pass I give preference to interesting.

Ok, now that I’ve shown that the model works, let’s start refining the model so that it is useful.  The first thing that I will tackle is salary constraints.  There are two types of salary constraints built into the model: individual position salary constraints and a total salary constraint.  Since Pedroia and Middlebrooks are set, we don’t have to worry about them.  I’m going to set every other position at $10 million, except for DH because Ortiz had been identified as a priority.  I’m also going to set the bench position maximum salaries at $5 million.  Also, given the trade during the season, I’m going to assume that the Red Sox are not going to break the bank this offseason.  Taking this into consideration, I’m going to set the total offensive salary at $60 million.  Using these constraints, we get the following results:

Name

Position

OPS

Salary

A.J. Pierzynski (36)

C

0.827

$6,000,000

Carlos Pena (35)

1B

0.684

$7,250,000

Dustin Pedroia

2B

0.797

$8,250,000

Will Middlebrooks

3B

0.835

$500,000

Marco Scutaro (37)

SS

0.753

$6,000,000

David Ortiz

DH

1.026

$14,575,000

Melky Cabrera (28)

OF

0.906

$6,000,000

Jonny Gomes (32)

OF

0.868

$1,000,000

Cody Ross

RF/LF

0.807

$3,000,000

Kelly Shoppach (33)

C

0.798

$1,350,000

Ronny Cedeno (30)

SS

0.741

$1,150,000

Eric Chavez (35)

3B

0.845

$900,000

Scott Hairston (33)

OF

0.803

$1,100,000

Total Salary:

$57,075,000

Average OPS:

0.822

This looks more like a potential 2013 roster than the first roster.  Obviously, there are a lot of different iterations that can be done by playing with salary variables which can affect the roster.

So, to some extent I’ve proven that an optimization model can be used as some sort of tool for developing a roster which means that I will continue to develop the model.  At this point it there are two roads that this analysis can go down.  One is to do all the research ahead of the transactions (this would involve predicting 2013 OPS and salary) and use this as a predictive model.  The other is to see what the Red Sox and other teams actually do this offseason and use this information to analyze the Red Sox offseason.  I’m choosing to do the latter right now.  So, I will continue to refine and add data as decisions are made and update at interesting points during the offseason.

-Erik Clark

Sources:

www.espn.com/mlb

www.mlbtraderumors.com

http://www.baseballprospectus.com/compensation/cots/

Weekly Sports Analytics Round-Up

16 Jul

I hesitated to include the “weekly” moniker on this post since it has been a good 11 weeks since the last sports analytics round-up.  But as you probably guessed, we have all been busy impressing potential future employers at our summer internships.  So, without further ado, here are this edition’s links:

 

Fascinating read about applying network theory to analyzing ball movement during the 2010 World Cup and 2012 Euro Cup.  http://www.technologyreview.com/view/428399/pagerank-algorithm-reveals-soccer-teams/

 

Interesting article comparing NFL Draft and NBA Draft pick values accounting for the difference in the number of starters between the two sports: http://www.thebiglead.com/index.php/2012/06/27/comparing-the-nba-draft-to-the-nfl-draft/

I, for one, am wary of people who still hate on Lebron because of the Decision or because he had to partner with a “Big Three” to win a title.  As for the latter, this article places Miami’s “other two” in perspective with other championship teams.  As for the former, I would only ask how many stupid decisions you made when you were 25?

http://www.thebiglead.com/index.php/2012/06/22/the-miami-heat-still-have-decisions-to-make-if-they-want-to-repeat-or-threepeat/

 

-DaveCaughman

Weekly Sports Analytics Round-up

29 Apr

In light of the recent NFL draft, Grantland ran a nice article postulating a new method for determining the “net value” of draft picks over or below what is expected from a player drafted in that position.

http://www.grantland.com/story/_/id/7849206/how-tell-which-draft-picks-truly-valuable

 

Not recent, but a very interesting sabermetrician profile that just came across my radar.

http://www.thepostgame.com/features/201101/sabermetrician-exile

 

The guys over at HSAC find out that April wins in baseball mean more than you might think (sorry Phils fans)

http://harvardsportsanalysis.wordpress.com/2012/04/20/how-important-is-a-good-april/#more-3111

 

-DaveCaughman

 

MLB Closer Analysis – Understanding Statistical Components

12 Apr

With baseball back in season, I started to think of my team (the Red Sox) and their closer situation.  They are transitioning to a new closer which made me wonder how they came to this decision.  To get an idea of what makes a good closer I did some research on closer statistics.  Ultimately, your ability as a closer comes down to how many save opportunities are converted.  So, I analyzed save percentage (saves/save opportunities) as a function of various independent variables with the following results:

Table 1

Variable

Probability

R-squared

BAA

0.0311

0.137126

BABIP

0.7301

0.003770

BB/9

0.4358

0.019089

ERA

0.0001

0.285438

K/9

0.2874

0.035279

K/BB

0.0976

0.083421

OBP

0.0155

0.159534

OPS

0.0005

0.315516

SLG

0.0017

0.268218

WHIP

0.0174

0.164343

Interpreting this information first requires an explanation of the table.  Probability can be described as the likelihood that a statistic is significant.  Generally, you want to see a number less than 0.05.  R-squared can be roughly described as what percentage of the total outcome can be attributed to the independent variable.  R-squared has a range of 0.0 to 1.0.  For example, OBP has a probability of 0.0155 which means OBP is significant and has an R-squared value of 0.159534 which means that OBP accounts for 15.95% of total variability in save percentage.  Below, I removed all insignificant variables and ordered them in order of R-squared values.

Table 2

Variable

Probability

R-squared

OPS

0.0005

0.315516

ERA

0.0001

0.285438

SLG

0.0017

0.268218

WHIP

0.0174

0.164343

OBP

0.0155

0.159534

BAA

0.0311

0.137126

There are a couple of interesting takeaways from this information.  The first takeaway is to recognize what stats are missing.  K/9 appears to have no significance in converting saves.  This is kind of a strange outcome because closers are generally thought of as big strikeout guys.  BB/9 and K/BB are also missing.  This is a little surprising because, even though closers aren’t necessarily control guys, you would expect a closer’s results to be somewhat dependent on these two variables.  The final missing stat is BABIP (batting average on balls in play).  This is something that is starting to be talked about more in the baseball world, but apparently has little effect on save percentage.

The next takeaway is the relative importance of each variable.  According to my research, OPS is the most significant stat with respect to save percentage.  This is somewhat surprising because I don’t think I’ve ever heard this stat with respect to pitchers.  However, after absorbing this information for a minute, it should not be that surprising.  OPS has grown in popularity over recent years as far as measuring a hitters performance.  So, it would stand to reason that you could measure a pitcher’s value based on the OPS that hitters have against him.  The next most important variable is an old-time statistic, ERA.  As it turns out, the old guard in baseball are bigger stat geeks than they mare care to admit.  But, again, this makes sense because if ERA didn’t matter than it wouldn’t have been such a popular stat for so long.  One stat that I was surprised by is WHIP.  This is my favorite new pitcher stat because I thought it encompassed most of the areas that ERA fell short on.  As it turns out, WHIP is the 4th most significant stat that I analyzed which is far less important than I would have anticipated.

So, now that we have all this information we should apply some of it.  First, let’s look at the closers who might be over their heads in their 2011 roles.  Here are the highest OPS values for closers with more than 10 saves:

Name                                    OPS

Jon Rauch                            0.799

Houston Street                 0.781

Kevin Gregg                       0.773

Matt Capps                         0.726

Frank Francisco                 0.721

Joakim Soria                       0.709

Of these six, Frank Francisco is the only one with a save so far this year and the only clear-cut closer.  However, Francisco changed teams and leagues and also signed a very reasonable $5.5 million contract, all of which may contribute to him continuing as a closer.  To be fair, Soria is out for the season with an elbow injury (perhaps his uncharacteristically poor 2011 could have been an indication of the arm injury), otherwise he’d be closing for the Royals.  The other 4 are either setup guys or in a situation where it’s not completely clear.  So, for the most part, the empirical evidence supports the theoretical research.

That’s it for my analysis this time around, but I intend to do some research on multi-variable analysis in the near future.

-Erik Clark

The 2012 Kentucky Wildcats – They Are Who We Thought They Were

6 Apr

Well, my regression model correctly predicted Kentucky winning it all.  So what?  Sometimes the data just confirms what we already believe.  While not as exciting as correctly predicting a 4 seed to win it all, the results nonetheless speak for themselves.  Overall, my model-based bracket placed 6th out of 27 in the first annual Tepper Sports Fanalytics Club Tourney Pick Em.  This equated to the 83rd percentile among Yahoo! brackets.  While not perfect, it beats the heck out of my usual bottom-feeder brackets.

But how should we really define “success” with an analytical-based bracket?  Picking the winner is nice, but I’m going to hold out and say success should be finishing “in the money”, which typically means 3rd place and higher in your pool.  That being said, you usually have to pick the winner to cash out, so we’re on the right track.

So it’s time to put a bow on the 2011-12 college basketball season.  Now we have baseball and the NBA playoffs to look forward to.  Stay tuned for more blogs along those lines in the near future!

-DaveCaughman

Follow

Get every new post delivered to your Inbox.