Jump to content
Twins Daily
  • Create Account
  • entries
    2
  • comments
    7
  • views
    5,331

Research Project wRC+


Gavin_Sanford

2,127 views

 Share

Twins Video

Being a Twins fan, and a statistics major, there are always connections to be made between on the field performance and statistics for a player. This has never been more available as statcast has allowed for measuring exit velocities, launch angles, and many other statistics for both pitchers and hitters. I wanted to look into which of these statistics has the greatest influence on Weighted Runs Created Plus. This is an offensive statistic which tries to credit each hit and situation for its true value. It measures hitters against each other and is able to be compared across years. It does this by including park effects and a season’s offensive comparison to others. In my research this is the dependent variable and is what analysis is done on.

 

The variables originally included in the study are a long list. They are Average exit velocity, Strike-out percentage, Walk Percentage, Sprint Speed, Balls hit ninety five plus miles per hour, barrels per plate appearance, line drive percentage, fly ball percentage, ground ball percentage, pull percentage, center percentage, opposite field percentage, and average launch angle. All of these were included as they describe a makeup of a hitters profile in direction, launch angle, velocity and a player’s speed. Barrels are balls that have a certain launch angle and exit velocity that leads to a high batting average and slugging percentage on similarly hit balls. Obviously, all of these are not going to be important to the model. The first thing I did was test for multicolinearity between variables. Variables that were multicolinear include pull percentage, opposite field percentage, and center percentage and this is intuitively pretty to understand. These are percentages that add up to one so they clearly affect each other and the value that the others become. The same is said about the multicolinearity between line drive percentage, ground ball percentage, and fly ball percentage. Focusing on variables I deemed possibly most important, I Looked at multicolinearity on average exit velocity, barrels per plate appearance, average launch angle and line drive percentage because these all conceptually want to quantify the same thing. There was no multicolinearity. Also in variable manipulation, I took sprint speed, pull percentage, and balls hit ninety five plus miles per hour and made them discrete variables with a value of 0, 1, or 2. This was to quantify around the top 10 % in each category based on the distribution they followed.

 

In selecting the final variables for a model I looked at proc glm in the programming language SAS. I did a forward and backward model selection and forced into the model average exit velocity and launch angle because these were not considered significant in model selection but we know they are in research. The reason glm was used instead of proc reg is because it gives extra attention to the discrete created variables and defines them as discrete instead of continuous which is what would be done with proc reg. I will touch on this later but essentially exit velocity and launch angle were extremely high in consideration for the model until barrels per plate appearance was entered. There was no multicolinearity between the two but it is interesting how average exit velocity and average launch angle do not get entered when barrels per plate appearance is added. This leads to the final model of:

 

wRC+ = β1 + β2aev + β3ala + β4bbp + β5 nss + β6nnfp + β7bbpa + β8ldp + β9kp + u

 

The following regression models are compared, looking at significance of variables and the overall models. The first model includes barrels per plate appearance while the second has that variable removed to look at interaction of the other variables within the model.

* Means significant at 95% **means significant at 99%

Parameter | Estimate | Standard Error | T-Value | Pr > |t| |

Intercept | 66.1834 | 47.434822 | 1.4 | 0.1639 |

AEV | -0.02094 | 0.55293177 |-0.04 | 0.9698 |

Ala |-0.13735 | 0.21858149 | -0.63 | 0.5302 |

Bbp** | 227.2026 | 27.9982043 | 8.11 | <.0001 |

Kp** | -174.353 | 17.3155471 | -10.07 | <.0001 |

nss 0** | -11.3559 | 3.13939906 | -3.62 | 0.0003 |

nss 1** | -7.22713 | 2.58785107 | -2.79 | 0.0055 |

nss 2 Is the baseline of the equation

nnfp 0* |-9.880559 | 4.05187092 | -2.44 | 0.0153 |

nnfp 1 | -0.36816 | 2.70815033 | -0.14 | 0.8919 |

nnfp 2 Is the baseline of the equation

Bppa** | 7.184027 | 0.64032719 | 11.22 | <.0001|

Ldp** |158.30838| 27.2845322 | 5.8 | <.0001 |

 

The above model has an F value of 57.18

 

vs.

 

Parameter | Estimate | Standard Error | T-value | Pr > |t| |

Intercept** | -226.969 | 46.59810 | -4.87 | <.0001 |

AEV** | 3.55070 | 0.53212884 | 6.67 |<.0001 |

Ala* | 0.61035 | 0.24502777 | 2.49 | 0.0132 |

Bbp** | 250.163 | 32.86669476 | 7.61 | <.0001 |

Kp** |-95.6383 | 18.63300806 | -5.13 | <.0001 |

nss 0* | -9.3594 | 3.68923955 | -2.54 | 0.0117 |

nss 1 | -5.67298 | 3.04162271 | -1.87 | 0.0631 |

nss 2 Is the baseline of the equation

nnfp 0** | -17.7382 | 4.69740546 | -3.78 | 0.0002 |

nnfp 1 |-5.008342 | 3.15019426 | -1.59 | 0.1128 |

nnfp 2 Is the baseline for the equation

Ldp** |107.2872 | 31.66561379 | 3.39 | 0.0008 |

 

The above model has an F value of 35.75

From the above models we see that including barrels per plate appearance is more efficient based on the F-values below for each model. We see in each model that walk percentage, strikeout percentage, and line drive percentage are statistically significant at 99%. In the first model, both sprint speeds are statistically significant at 99% meaning the faster you are the higher your wRC+ should be. In the second model only the nss0 which is the discrete variable for the slowest people in the category is significant and the negative coefficient on the parameter shows us the effect it has on the model. Balls hit ninety five miles per hour plus is significant only to those who have the least as well or around the bottom 10 % in both models in the first model at 95% and the second 99%. The first model also has barrels per plate appearance which is statistically significant at 99%. The value barrels contribute is immense to the model and we see average launch angle and average exit velocity as insignificant. In model two, without the barrels per plate appearance we see average launch angle and average exit velocity become statistically significant; exit velocity at 99% and launch angle at 95%. We see from above that strike-out percentage negatively effects wRC+. We see that balls hit ninety five miles per hour plus,sprint speed, walk percentage, and line drive percentage have a positive effect on wRC+. Barrels per plate appearance is the most important in deciding wRC+, but without this an increase in average exit velocity and average launch angle are valuable. The fly ball revolution has led to more players having success in the air and altering swing paths to lead to more fly balls. As we see from the negative values in average launch angle in model one and a positive in model two, it is good to get a general increase in launch angle but the value is more hitting the ball hard and "barreling" it on these or increases will lead to more fly outs and less grounders that may make it through the holes.

 

Pairing the said analysis with predictions, there are players who based on the model including barrels had very large differences in their wRC+ and the predicted value for it. This means some players got "lucky" and had better results than expected and some played worse. These players should have bounce back years in 2018 if they can replicate what they did in 2017.

 

Name | wRC+ | Resid | Predicted value | Studentized_residual

D.J. Lamehieu | 94 | -58 | 152 | -4.04

Miguel Cabrera | 91 | -47 | 138 | -3.2

Mitch Moreland | 98 | -35 | 133 | -2.33

Dansby Swanson |66 | -34 |100 | -2.33

Brandon Moss | 84 | -34 |118 | -2.28

Alex Gordon |62 |-31 |93 | -2.1

Austin Romine |49 | -30 | 79 | -2.05

Maikel Franco | 76 | -30 | 106 | -2.04

Chris Herrman |58 | -29 | 87 |-2.01

Taylor Motter |57 |-30 | 87 | -1.99

Hyun Soo Kim |61 | -28 | 89 | -1.94

Pablo Sandoval |64 | -29 | 93 |-1.94

J.J. Hardy |50 | -28 | 78 | -1.9

Randal Grichuk |94 | -27 | 121 | -1.85

Tony Walters | 49 | -27 | 76 | -1.81

 

There are a handful of interesting names on this list. D.J. Lamehieu increasing his wRC+ is extreme but him being a bounce back player has already showed in a hot start so far this year. Maybe it continues. Miguel Cabrera is coming off his worst season since entering the league and numbers say he should have been better. Others on this list like Randal Grichuk are intriguing as he was traded and hadn't lived up to his potential. Can he catch up to his numbers? An interesting look at the numbers here show J.J. Hardy as a candidate to have a bounce back year but that would still be to be 22 % below average. For those wondering if he was an option at short to play for Polanco during the suspension, this could be why.Now for a list of players who could slow down. (Caution:Good Players on this List, and No I don't think they will regress, this much anyway)

 

Name | wRC+ | Residual | Predicted Value | Studentized-Residual

Jose Altuve | 160 |38 | 122 | 2.56

Mike Trout | 181 | 35 | 146 | 2.42

Mitch Haniger | 129 | 36 | 93 | 2.41

Marwin Gonzalez| 144 | 35 | 109 | 2.38

Zack Cozart |141 |32 | 109 | 2.12

Austin Jackson |131 | 31 | 100 | 2.08

Jose Rameriz |148 |31 |117 | 2.07

Eduardo Nunez |112 | 29 | 83 | 1.96

Marcell Ozuna |142 |27 | 115 | 1.85

Scooter Gennett |124 | 27 | 97 |1.84

George Springer |150 | 27 |123 |1.8

 

Like I said, good players. Mike Trout and Jose Altuve top the list but some players will always outplay projections. The interesting names here are those who had predicted values around average and were far batter. These are difference makes like Marwin Gonzalez, Austin Jackson, and George Springer that help push a team over the top and far into the playoffs. They also help a team on the fringe get in.

 

Now what does this mean for the Twins?

Good news is no Twins were on either lists, meaning the offense that was so explosive last year performed near expected as no players extremely over performed and no one under performed hurting the offense. A list of Twins players below shows a prediction if output is close to last year of what could happen.

 

Name | wRC+ | Residual | Predicted Value

Joe Mauer | 116 | -4.9345 | 121

Kennys Vargas | 98 | 14.2709 | 84

Logan Morrison | 130 | -.2185 | 130

Miguel Sano | 124 | 11.5868 | 112

Brian Dozier | 124 | 7.8157 | 116

Jorge Polanco | 89 | -9.8506 | 98

Eduardo Escobar | 96 | -16.9470 | 113

Max Kepler | 92 | 3.9207 | 88

Robbie Grossman | 102 | -9.9625 |112

Byron Buxton | 90 | .6321 | 89.388

Eddie Rosario | 116 | 6.8566 | 109

 

We see some players fluctuate around their number and some go up or down around 10 % No drastic changes should be expected from this roster of young players. We could see an increase in plate discipline with age which could lead to an increase and could be something worth watching. Though Logan Morrison got off to a bad start, looking at his numbers from last year, if he replicates this output from last year the Twins will have gotten a steal. Another thing to look at is Joe Mauer, though he has aged and has a huge contract expiring, he still has a lot of value in getting on base and that is hard to get at the rate Joe does. This could be an argument to pay him and keep him around.

 Share

5 Comments


Recommended Comments

This is an interesting study. I think you should look into adding "solid contact" along with "Flares & Burrners" into your study along with Barrels. If you are unfamiliar with these, they are other contact quality measurements as defined by Statcast, like Barrels. I think they will be good to include because they are also contact types that frequently produces base hits (though ones that don't necessarily go for as much power).

 

I think you could use these and then get rid of other contract based stats like average exit velocity, average launce angle, an 95+ MPH to help avoid potential issues with multicollinearity.

 

I would also recommend that when publishing research like this it is often best to make the information as reader friendly as possible. I know it can be difficult with the layout of the site, so I would recommend inserting your data into a clean chart outside of the site, and then uploading a screenshot of your chart. It will look a lot cleaner, and will be easier for your readers to follow along.

 

But all in all this is great work. If you decided to take your research into the topic any further I would love to see your results.

Link to comment

I am glad you are able to pursue this, too much math, too many variables for me.  I am beyond see it - hit it, but not this far.  Keep going and I might understand it some day.

Link to comment

 

This is an interesting study. I think you should look into adding "solid contact" along with "Flares & Burrners" into your study along with Barrels. If you are unfamiliar with these, they are other contact quality measurements as defined by Statcast, like Barrels. I think they will be good to include because they are also contact types that frequently produces base hits (though ones that don't necessarily go for as much power).

 

I think you could use these and then get rid of other contract based stats like average exit velocity, average launce angle, an 95+ MPH to help avoid potential issues with multicollinearity.

 

I would also recommend that when publishing research like this it is often best to make the information as reader friendly as possible. I know it can be difficult with the layout of the site, so I would recommend inserting your data into a clean chart outside of the site, and then uploading a screenshot of your chart. It will look a lot cleaner, and will be easier for your readers to follow along.

 

But all in all this is great work. If you decided to take your research into the topic any further I would love to see your results.

Thanks for the feedback. Another thing i was going to look into was batting average on balls in play. I was hoping this could account for some randomness in the model and variance that was unaccounted for. And do they have these numbers for 2017? I would totally be willing to add those and look at how sprint speed and flares and burners work together. Uploading the charts was honestly a mess for me so I went this route. Thanks for the feedback!

Link to comment

 

Thanks for the feedback. Another thing i was going to look into was batting average on balls in play. I was hoping this could account for some randomness in the model and variance that was unaccounted for. And do they have these numbers for 2017? I would totally be willing to add those and look at how sprint speed and flares and burners work together. Uploading the charts was honestly a mess for me so I went this route. Thanks for the feedback!

It never hurts to look into different statistics to see if they can improve the quality of your model, but personally I don't think adding BABIP will help for a couple of reasons. First, its is not the luck based stat that many people make it out to be. Certain players, and even certain ballparks, produce a higher BABIP based on their skill and dimensions respectively. So to try an set a league wide baseline and assume that players will naturally regress to that just isn't the way it works. Second, and perhaps more importantly, it is a batting average based stat, and I'm sure you understand the limitations of using batting average when comparing it to more complete stats like wRC+.

 

For finding solid contact and flares & burners they are available at baseball savant as well. The problem is I don't think there is an easy table to pull with the data like there is with Barrels. The only place I've seen them is on individual players radial charts on the site. To find them you can go to Statcast Search>Fliter to:Player Type(Batter),Season(2017),Min ABs(None). This should give you every player who took an at-bat last season. Then you can click on the Graphs button on each batter, and select Radial Chart. Here is where they have that data. At this point it is up to you to decide if you want to go through all the work to pull that data by hand, or if you are good a computer programming you could maybe figure out an easier way to pull it.

Link to comment

 

It never hurts to look into different statistics to see if they can improve the quality of your model, but personally I don't think adding BABIP will help for a couple of reasons. First, its is not the luck based stat that many people make it out to be. Certain players, and even certain ballparks, produce a higher BABIP based on their skill and dimensions respectively. So to try an set a league wide baseline and assume that players will naturally regress to that just isn't the way it works. Second, and perhaps more importantly, it is a batting average based stat, and I'm sure you understand the limitations of using batting average when comparing it to more complete stats like wRC+.

 

For finding solid contact and flares & burners they are available at baseball savant as well. The problem is I don't think there is an easy table to pull with the data like there is with Barrels. The only place I've seen them is on individual players radial charts on the site. To find them you can go to Statcast Search>Fliter to:Player Type(Batter),Season(2017),Min ABs(None). This should give you every player who took an at-bat last season. Then you can click on the Graphs button on each batter, and select Radial Chart. Here is where they have that data. At this point it is up to you to decide if you want to go through all the work to pull that data by hand, or if you are good a computer programming you could maybe figure out an easier way to pull it.

I didn't think of BABIP that way but you bring up a good point. I checked a lot of the people who are high in the residuals and it was a mix of each. generally the players were around their career numbers and I consider myself pretty handy on the computer programming but pulling from all the individual pages is what I had to do for others so I am sure it'd be the same. I will definitively revisit the suggestions you made at some point. Thanks for the input as it has been very valuable in looking at other points and changes I can make.

Link to comment
Guest
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...