Research Project wRC+
Twins Video
Being a Twins fan, and a statistics major, there are always connections to be made between on the field performance and statistics for a player. This has never been more available as statcast has allowed for measuring exit velocities, launch angles, and many other statistics for both pitchers and hitters. I wanted to look into which of these statistics has the greatest influence on Weighted Runs Created Plus. This is an offensive statistic which tries to credit each hit and situation for its true value. It measures hitters against each other and is able to be compared across years. It does this by including park effects and a season’s offensive comparison to others. In my research this is the dependent variable and is what analysis is done on.
The variables originally included in the study are a long list. They are Average exit velocity, Strike-out percentage, Walk Percentage, Sprint Speed, Balls hit ninety five plus miles per hour, barrels per plate appearance, line drive percentage, fly ball percentage, ground ball percentage, pull percentage, center percentage, opposite field percentage, and average launch angle. All of these were included as they describe a makeup of a hitters profile in direction, launch angle, velocity and a player’s speed. Barrels are balls that have a certain launch angle and exit velocity that leads to a high batting average and slugging percentage on similarly hit balls. Obviously, all of these are not going to be important to the model. The first thing I did was test for multicolinearity between variables. Variables that were multicolinear include pull percentage, opposite field percentage, and center percentage and this is intuitively pretty to understand. These are percentages that add up to one so they clearly affect each other and the value that the others become. The same is said about the multicolinearity between line drive percentage, ground ball percentage, and fly ball percentage. Focusing on variables I deemed possibly most important, I Looked at multicolinearity on average exit velocity, barrels per plate appearance, average launch angle and line drive percentage because these all conceptually want to quantify the same thing. There was no multicolinearity. Also in variable manipulation, I took sprint speed, pull percentage, and balls hit ninety five plus miles per hour and made them discrete variables with a value of 0, 1, or 2. This was to quantify around the top 10 % in each category based on the distribution they followed.
In selecting the final variables for a model I looked at proc glm in the programming language SAS. I did a forward and backward model selection and forced into the model average exit velocity and launch angle because these were not considered significant in model selection but we know they are in research. The reason glm was used instead of proc reg is because it gives extra attention to the discrete created variables and defines them as discrete instead of continuous which is what would be done with proc reg. I will touch on this later but essentially exit velocity and launch angle were extremely high in consideration for the model until barrels per plate appearance was entered. There was no multicolinearity between the two but it is interesting how average exit velocity and average launch angle do not get entered when barrels per plate appearance is added. This leads to the final model of:
wRC+ = β1 + β2aev + β3ala + β4bbp + β5 nss + β6nnfp + β7bbpa + β8ldp + β9kp + u
The following regression models are compared, looking at significance of variables and the overall models. The first model includes barrels per plate appearance while the second has that variable removed to look at interaction of the other variables within the model.
* Means significant at 95% **means significant at 99%
Parameter | Estimate | Standard Error | T-Value | Pr > |t| |
Intercept | 66.1834 | 47.434822 | 1.4 | 0.1639 |
AEV | -0.02094 | 0.55293177 |-0.04 | 0.9698 |
Ala |-0.13735 | 0.21858149 | -0.63 | 0.5302 |
Bbp** | 227.2026 | 27.9982043 | 8.11 | <.0001 |
Kp** | -174.353 | 17.3155471 | -10.07 | <.0001 |
nss 0** | -11.3559 | 3.13939906 | -3.62 | 0.0003 |
nss 1** | -7.22713 | 2.58785107 | -2.79 | 0.0055 |
nss 2 Is the baseline of the equation
nnfp 0* |-9.880559 | 4.05187092 | -2.44 | 0.0153 |
nnfp 1 | -0.36816 | 2.70815033 | -0.14 | 0.8919 |
nnfp 2 Is the baseline of the equation
Bppa** | 7.184027 | 0.64032719 | 11.22 | <.0001|
Ldp** |158.30838| 27.2845322 | 5.8 | <.0001 |
The above model has an F value of 57.18
vs.
Parameter | Estimate | Standard Error | T-value | Pr > |t| |
Intercept** | -226.969 | 46.59810 | -4.87 | <.0001 |
AEV** | 3.55070 | 0.53212884 | 6.67 |<.0001 |
Ala* | 0.61035 | 0.24502777 | 2.49 | 0.0132 |
Bbp** | 250.163 | 32.86669476 | 7.61 | <.0001 |
Kp** |-95.6383 | 18.63300806 | -5.13 | <.0001 |
nss 0* | -9.3594 | 3.68923955 | -2.54 | 0.0117 |
nss 1 | -5.67298 | 3.04162271 | -1.87 | 0.0631 |
nss 2 Is the baseline of the equation
nnfp 0** | -17.7382 | 4.69740546 | -3.78 | 0.0002 |
nnfp 1 |-5.008342 | 3.15019426 | -1.59 | 0.1128 |
nnfp 2 Is the baseline for the equation
Ldp** |107.2872 | 31.66561379 | 3.39 | 0.0008 |
The above model has an F value of 35.75
From the above models we see that including barrels per plate appearance is more efficient based on the F-values below for each model. We see in each model that walk percentage, strikeout percentage, and line drive percentage are statistically significant at 99%. In the first model, both sprint speeds are statistically significant at 99% meaning the faster you are the higher your wRC+ should be. In the second model only the nss0 which is the discrete variable for the slowest people in the category is significant and the negative coefficient on the parameter shows us the effect it has on the model. Balls hit ninety five miles per hour plus is significant only to those who have the least as well or around the bottom 10 % in both models in the first model at 95% and the second 99%. The first model also has barrels per plate appearance which is statistically significant at 99%. The value barrels contribute is immense to the model and we see average launch angle and average exit velocity as insignificant. In model two, without the barrels per plate appearance we see average launch angle and average exit velocity become statistically significant; exit velocity at 99% and launch angle at 95%. We see from above that strike-out percentage negatively effects wRC+. We see that balls hit ninety five miles per hour plus,sprint speed, walk percentage, and line drive percentage have a positive effect on wRC+. Barrels per plate appearance is the most important in deciding wRC+, but without this an increase in average exit velocity and average launch angle are valuable. The fly ball revolution has led to more players having success in the air and altering swing paths to lead to more fly balls. As we see from the negative values in average launch angle in model one and a positive in model two, it is good to get a general increase in launch angle but the value is more hitting the ball hard and "barreling" it on these or increases will lead to more fly outs and less grounders that may make it through the holes.
Pairing the said analysis with predictions, there are players who based on the model including barrels had very large differences in their wRC+ and the predicted value for it. This means some players got "lucky" and had better results than expected and some played worse. These players should have bounce back years in 2018 if they can replicate what they did in 2017.
Name | wRC+ | Resid | Predicted value | Studentized_residual
D.J. Lamehieu | 94 | -58 | 152 | -4.04
Miguel Cabrera | 91 | -47 | 138 | -3.2
Mitch Moreland | 98 | -35 | 133 | -2.33
Dansby Swanson |66 | -34 |100 | -2.33
Brandon Moss | 84 | -34 |118 | -2.28
Alex Gordon |62 |-31 |93 | -2.1
Austin Romine |49 | -30 | 79 | -2.05
Maikel Franco | 76 | -30 | 106 | -2.04
Chris Herrman |58 | -29 | 87 |-2.01
Taylor Motter |57 |-30 | 87 | -1.99
Hyun Soo Kim |61 | -28 | 89 | -1.94
Pablo Sandoval |64 | -29 | 93 |-1.94
J.J. Hardy |50 | -28 | 78 | -1.9
Randal Grichuk |94 | -27 | 121 | -1.85
Tony Walters | 49 | -27 | 76 | -1.81
There are a handful of interesting names on this list. D.J. Lamehieu increasing his wRC+ is extreme but him being a bounce back player has already showed in a hot start so far this year. Maybe it continues. Miguel Cabrera is coming off his worst season since entering the league and numbers say he should have been better. Others on this list like Randal Grichuk are intriguing as he was traded and hadn't lived up to his potential. Can he catch up to his numbers? An interesting look at the numbers here show J.J. Hardy as a candidate to have a bounce back year but that would still be to be 22 % below average. For those wondering if he was an option at short to play for Polanco during the suspension, this could be why.Now for a list of players who could slow down. (Caution:Good Players on this List, and No I don't think they will regress, this much anyway)
Name | wRC+ | Residual | Predicted Value | Studentized-Residual
Jose Altuve | 160 |38 | 122 | 2.56
Mike Trout | 181 | 35 | 146 | 2.42
Mitch Haniger | 129 | 36 | 93 | 2.41
Marwin Gonzalez| 144 | 35 | 109 | 2.38
Zack Cozart |141 |32 | 109 | 2.12
Austin Jackson |131 | 31 | 100 | 2.08
Jose Rameriz |148 |31 |117 | 2.07
Eduardo Nunez |112 | 29 | 83 | 1.96
Marcell Ozuna |142 |27 | 115 | 1.85
Scooter Gennett |124 | 27 | 97 |1.84
George Springer |150 | 27 |123 |1.8
Like I said, good players. Mike Trout and Jose Altuve top the list but some players will always outplay projections. The interesting names here are those who had predicted values around average and were far batter. These are difference makes like Marwin Gonzalez, Austin Jackson, and George Springer that help push a team over the top and far into the playoffs. They also help a team on the fringe get in.
Now what does this mean for the Twins?
Good news is no Twins were on either lists, meaning the offense that was so explosive last year performed near expected as no players extremely over performed and no one under performed hurting the offense. A list of Twins players below shows a prediction if output is close to last year of what could happen.
Name | wRC+ | Residual | Predicted Value
Joe Mauer | 116 | -4.9345 | 121
Kennys Vargas | 98 | 14.2709 | 84
Logan Morrison | 130 | -.2185 | 130
Miguel Sano | 124 | 11.5868 | 112
Brian Dozier | 124 | 7.8157 | 116
Jorge Polanco | 89 | -9.8506 | 98
Eduardo Escobar | 96 | -16.9470 | 113
Max Kepler | 92 | 3.9207 | 88
Robbie Grossman | 102 | -9.9625 |112
Byron Buxton | 90 | .6321 | 89.388
Eddie Rosario | 116 | 6.8566 | 109
We see some players fluctuate around their number and some go up or down around 10 % No drastic changes should be expected from this roster of young players. We could see an increase in plate discipline with age which could lead to an increase and could be something worth watching. Though Logan Morrison got off to a bad start, looking at his numbers from last year, if he replicates this output from last year the Twins will have gotten a steal. Another thing to look at is Joe Mauer, though he has aged and has a huge contract expiring, he still has a lot of value in getting on base and that is hard to get at the rate Joe does. This could be an argument to pay him and keep him around.
5 Comments
Recommended Comments