Jump to content
Twins Daily
  • Create Account

Article: Mailbag: Available Pitchers, Buxton Hype, Baseball Time Machine


Recommended Posts

 

Dave Boswell (and Luis Tiant) injuries sped the demise of the 1969-70 Twins powerhouse.

 

Boswell won 20 games at age 24 in 1969 (after being punched out by none other than our manager, Billy Martin). 2 years and only 4 wins later he was out of baseball after an arm injury pitching to Frank Robinson in the playoffs that year.

 

Tiant was acquired from Cleveland only to suffer from surgery and recovery related arm issues and was released. Who else but Boston enjoyed his recovery.

 

Yeah Boswell is a forgotten guy sometimes. 64 wins by the age of 24 and 37 complete games! He was on his way to a great career.

 

Maybe not with the Twins though, they were approaching their "trade everyone" era and while no one liked Billy Martin so getting knocked out by him might not be a great determination of character, just previous to his KO, he himself knocked out Bobby Allison, so, he may not have been a beloved teammate.

Link to comment
Share on other sites

Small sample size to small sample size to small sample size to small sample size.

It's like George Carlin's observation about breaking a crumb in half. You don't have two half-crumbs, they're just two crumbs now, in seeming violation of the laws of physics.

 

Similarly, putting together small sample sizes remain small sample sizes, no matter how long you keep doing it. :)

Link to comment
Share on other sites

Provisional Member

Significant injuries.

I was a young kid in Maine (Reds Sox  territory) in 1967. I painfully recall the Red Sox beating the twins in the final game of the regular season to win the the pennant.Two minor injuries that year cost the twins a chance to go to their second world series. Gary Bell, a Boston pitcher, plunked Killebrew on the left upper arm on August 4th. The arm swelled up terribly. He didn't hit a homer for about two weeks and the Twins slumped. The twins probably lost a couple of games as a result. That same year Jim Kaat was hot and the starting pitcher  in that final two game series.:

"A disappointing 9-13 through August that year. Kaat in September produced what he said was the best pitching of his career: Going 7-0 in seven starts, averaging nine innings per outing, heading into his start at Fenway Park on Sept. 30.

He remembers talking to Koufax, then a TV analyst, in the trainer’s room before that game. “I was as confident at the time as any pitcher could be,” he said.

Kaat lasted only 2-1/3 innings. But it wasn’t the Sox who knocked him out of the game.

Pitching with a 1-0 lead, Kaat in the third inning injured a ligament in his throwing elbow, an injury which these days is corrected with Tommy John surgery. To that point, he had surrendered three harmless hits, walked one, and struck out four.

Many years later, after their baseball playing days, Kaat and Boston’s Carl Yastrzemski both lived in Boca Raton, Fla. Kaat was out for a bike ride one day and came by Yaz who told him if the lefty hadn’t gotten hurt that day, the Sox weren’t going to win the game.

“We wouldn’t have beaten him,” echoed Swansea’s Russ Gibson, the Sox’s starting catcher that day and one of Kaat’s four strikeout victims. “You couldn’t believe the stuff he had. He was pin-point. He was on the black. His breaking ball was working.”

 

Link to comment
Share on other sites

Sure looked like Liriano was heading for Cy Young that year.    I agree that having Liriano would not have made the playoffs any sure thing because the Twins didn't score more than 3 runs in any of those games but on the other hand if they had Liriano for more games maybe they wouldn't have been a little more rested and if they just won one of those first three games because of Liriano then you probably have Santana and Liriano for the next two games.        Just cuz they have Liriano doesn't mean they would have won.    Just because they got swept mean they would have lost the series with Liriano.   Its like the series over the years with the Yankees.   If they just won one of those heart breakers maybe it changes everything else.   

Link to comment
Share on other sites

Community Moderator

 

It's like George Carlin's observation about breaking a crumb in half. You don't have two half-crumbs, they're just two crumbs now, in seeming violation of the laws of physics.

 

Similarly, putting together small sample sizes remain small sample sizes, no matter how long you keep doing it. :)

Tell that to the defensive metrics fans. 

Link to comment
Share on other sites

I'd like to talk a little bit more about Morneau.    His 2010 concussion halted an absolute monster year where he would have been a really strong candidate for MVP and might have made a difference in the season and the playoffs.     However, if I could go back in time I maybe go back to 2008 and at a minimum keep him out of the home run derby and maybe the all star game itself.   Then I rest him as much as I can the rest of the way.    He had a sub par August and a very poor September.    Later we found out he had been playing with a stress fracture in his back.    A healthy Morneau for those last two months very likely makes a game 163 with the White Sox unnecessary and we would have played the Rays who we matched up ok with.   Maybe that was the year things would have fallen our way.   Who knows? 

 

 On  an individual level we talk about how injuries have derailed or dimmed Oliva and Mauer's chances for HOF.   How about Morneau?    Without playing 163 games and the stress fracture he would have been a shoo in for MVP that year if he had a strong September and the Twins made the playoffs.   Couple that with a possible MVP in 2010 if he did not have the concussion and now you have a guy that wins 3 MVP's.    Just on the surface that should get him into the Hall.    Add in his performance after 2010 if he never suffered from that concussion just makes an already easy pick that much easier.    No one talks about him and the Hall but if things break a different way he definitely had the ability.

Link to comment
Share on other sites

Community Moderator

 

I'd like to talk a little bit more about Morneau.    His 2010 concussion halted an absolute monster year where he would have been a really strong candidate for MVP and might have made a difference in the season and the playoffs.     However, if I could go back in time I maybe go back to 2008 and at a minimum keep him out of the home run derby and maybe the all star game itself.   Then I rest him as much as I can the rest of the way.    He had a sub par August and a very poor September.    Later we found out he had been playing with a stress fracture in his back.    A healthy Morneau for those last two months very likely makes a game 163 with the White Sox unnecessary and we would have played the Rays who we matched up ok with.   Maybe that was the year things would have fallen our way.   Who knows? 

 

 On  an individual level we talk about how injuries have derailed or dimmed Oliva and Mauer's chances for HOF.   How about Morneau?    Without playing 163 games and the stress fracture he would have been a shoo in for MVP that year if he had a strong September and the Twins made the playoffs.   Couple that with a possible MVP in 2010 if he did not have the concussion and now you have a guy that wins 3 MVP's.    Just on the surface that should get him into the Hall.    Add in his performance after 2010 if he never suffered from that concussion just makes an already easy pick that much easier.    No one talks about him and the Hall but if things break a different way he definitely had the ability.

Concur 100 percent.

 

Morneau was the better player between he and Mauer. 

Link to comment
Share on other sites

Agreed, as long as they don't stubbornly hold on to that player and gift them a roster spot.

That’s exactly the result I’m hoping to avoid. No more Logan Morrison walking past the lineup card without checking to see if his name is on it.

Link to comment
Share on other sites

Community Moderator

Well, to properly use them, you need three years. Not sure your point at all.

unreliable small sample size, added to unreliable small sample size, is suddenly considered reliable, just because you now have more of the unreliable data.

 

Doesn't compute, IMO.

Link to comment
Share on other sites

 

unreliable small sample size, added to unreliable small sample size, is suddenly considered reliable, just because you now have more of the unreliable data.

Doesn't compute, IMO.

Isn't this a pretty foundational concept of statistics? That the larger the sample, the more reliable it can be?

 

1000 plate appearances doesn't have the exact same reliability as each 10 PA chunk. Collectively, it basically sums the meager reliability of each of those small samples, and after awhile, it has meaningfully more reliability. You're never going to reach perfect 100% reliability in this field, and where you draw the line at "reliable enough" can be a little fuzzy/subjective/context-dependent, but the basic idea that a larger sample (3 seasons of defensive data) is more reliable than a smaller sample (1 season) is certainly true.

 

Edit: obviously if you're measuring something meaningless, you're never going to get a meaningful conclusion no matter how large the measurement sample. "Garbage in, garbage out" as they say. But I don't think it's fair to characterize the inputs of defensive metrics as meaningless, as imperfect as they might be. Even without Statcast, the inputs are basically just recorded scouting observations -- the location of the ball, the position of the fielder, the result of the play, etc. It's less reliable than offensive measurement, which is why we have the "three season" rule of thumb, but it's not meaningless.

Edited by spycake
Link to comment
Share on other sites

I concur with all that Spy wrote above, but want to add that defensive stats suffer from an inherent small sample size compared to batting stats. Each plate appearance consists on average of around 4 pitches, these days. Even though only outcomes are recorded, there is a richness in the experience being measured that goes far beyond the basic numbers. How well does the batter lay off bad pitches, how often does he whiff on good pitches - all these micro-results wind up contributing to what we know of as a plate appearance.

 

By comparison, defensive stats suffer the opposite problem. There is less to the data than meets the eye. An awful lot of Total Chances are on plays like cans of corn to the outfield and routine grounders to the infielders. Separating the wheat from the chaff is the first task of the data analysis, and there is an awful lot of chaff. That's IMO why it takes multiple seasons for defensive stats to take on the same meaningfulness of their offensive counterparts.

Link to comment
Share on other sites

 

I concur with all that Spy wrote above, but want to add that defensive stats suffer from an inherent small sample size compared to batting stats. Each plate appearance consists on average of around 4 pitches, these days. Even though only outcomes are recorded, there is a richness in the experience being measured that goes far beyond the basic numbers. How well does the batter lay off bad pitches, how often does he whiff on good pitches - all these micro-results wind up contributing to what we know of as a plate appearance.

 

By comparison, defensive stats suffer the opposite problem. There is less to the data than meets the eye. An awful lot of Total Chances are on plays like cans of corn to the outfield and routine grounders to the infielders. Separating the wheat from the chaff is the first task of the data analysis, and there is an awful lot of chaff. That's IMO why it takes multiple seasons for defensive stats to take on the same meaningfulness of their offensive counterparts.

 

I don't think anyone is arguing otherwise. Clearly defensive measures will always be behind offensive ones......

Link to comment
Share on other sites

I don't think anyone is arguing otherwise. Clearly defensive measures will always be behind offensive ones......

Understanding why they are different (and I offer no guarantee that I do) allows one to judge when and how much to trust the defensive stats we do have. It's not, for instance, all subjective on the defensive side - that's not the core problem.

Link to comment
Share on other sites

 

That's IMO why it takes multiple seasons for defensive stats to take on the same meaningfulness of their offensive counterparts.

Yup. And I'm also comfortable saying that defensive stats (at least as we know them right now) can probably never take on the same meaningfulness as their offensive counterparts. Most notably, whereas K and BB rates might stabilize around 150 PA or whatever, once you're looking at 3 years of defensive stats, you essentially have to start accounting for age too, not to mention other factors (coaching and personnel changes?).

 

That's not to say that they're meaningless, or fully unreliable. Maybe the conclusions drawn from them have to be a little less specific. Maybe they have more value looking backward than forward, etc.

Link to comment
Share on other sites

Yup. And I'm also comfortable saying that defensive stats (at least as we know them right now) can probably never take on the same meaningfulness as their offensive counterparts. Most notably, whereas K and BB rates might stabilize around 150 PA or whatever, once you're looking at 3 years of defensive stats, you essentially have to start accounting for age too, not to mention other factors (coaching and personnel changes?).

 

That's not to say that they're meaningless, or fully unreliable. Maybe the conclusions drawn from them have to be a little less specific. Maybe they have more value looking backward than forward, etc.

The good news may be that it's less important to be able to make the distinctions between good and so-so defense, for the same SSS reasons that make the analysis harder: Robbie Grossman in the outfield doesn't get a chance to affect the game's outcome as often as we sometimes think. :)

Link to comment
Share on other sites

Defensive stats do need a significant sample size so do many of the batting and pitching numbers that are heavily used on almost any baseball telecast. Not only are these stats presented as meaningful in a partial season sample they are often used in splits that further degrade the sample.

 

One 2018 article with some discussion of the reliability of defensive metrics and reliability.

 

https://blogs.fangraphs.com/statcasts-outs-above-average-and-uzr/

 

 

For those that don’t trust defensive numbers, keep in mind that r-squared for Offensive runs above average per 600 plate appearances for this same group of players was .21. Defense had a higher correlation than wOBA, wRC+, slugging percentage, and on-base percentage and was roughly equivalent to walk percentage and ISO. Much of this relationship is going to be due to positional factors, but positional factors are a very important part of determining a player’s value and overall defensive value is pretty consistent year to year.

Defensive stats have evolved since 2012 but here is an early article on their reliability. They have improved since.

 

https://www.billjamesonline.com/how_well_do_advanced_defensive_statistics_correlate/

 

 

We are at the point where our defensive analytics are nearly as reliable as offensive and pitching analytics. Just looking at the single best statistic in each: OPS is .69, Opponent OPS is .61, Defensive Runs Saved is .59. We’ve come a long way.

 

I join those of you that are skeptical of defensive metrics in samples smaller than a full season or even more. I think we should be just as skeptical of slash stats, ERA, FIP and others that are much more commonly cited.

Edited by jorgenswest
Link to comment
Share on other sites

Community Moderator

 

Isn't this a pretty foundational concept of statistics? That the larger the sample, the more reliable it can be?

 

1000 plate appearances doesn't have the exact same reliability as each 10 PA chunk. Collectively, it basically sums the meager reliability of each of those small samples, and after awhile, it has meaningfully more reliability. You're never going to reach perfect 100% reliability in this field, and where you draw the line at "reliable enough" can be a little fuzzy/subjective/context-dependent, but the basic idea that a larger sample (3 seasons of defensive data) is more reliable than a smaller sample (1 season) is certainly true.

 

Edit: obviously if you're measuring something meaningless, you're never going to get a meaningful conclusion no matter how large the measurement sample. "Garbage in, garbage out" as they say. But I don't think it's fair to characterize the inputs of defensive metrics as meaningless, as imperfect as they might be. Even without Statcast, the inputs are basically just recorded scouting observations -- the location of the ball, the position of the fielder, the result of the play, etc. It's less reliable than offensive measurement, which is why we have the "three season" rule of thumb, but it's not meaningless.

The difference is, while 10 PAs are certainly not predictive, nobody would argue those 10 PAs (whether the dude hit 1.000 or .000) weren't accurately measured. Those 10 PAs did happen, and we know, for certain, what the results were. We can use those 10 accurately measured PAs, in conjunction with another 10 and another 10, and so on, to accurately measure what happened, and perhaps form an educated opinion about what is likely to happen in the future.

 

Defensive metrics aren't sold that way. Everyone admits small sample sizes do not necessarily represent what actually happened. But those same people then turn around and claim "more inaccurate data" will solve the accuracy issue. 

 

I don't think it does. I think adding 10 accurately measured PAs to another 990 of the same gives you an accurate picture of what happened. But adding 10 inaccurately measured defensive plays to 990 other inaccurately measured defensive plays doesn't.

Link to comment
Share on other sites

 

I concur with all that Spy wrote above, but want to add that defensive stats suffer from an inherent small sample size compared to batting stats. Each plate appearance consists on average of around 4 pitches, these days. Even though only outcomes are recorded, there is a richness in the experience being measured that goes far beyond the basic numbers. How well does the batter lay off bad pitches, how often does he whiff on good pitches - all these micro-results wind up contributing to what we know of as a plate appearance.

 

By comparison, defensive stats suffer the opposite problem. There is less to the data than meets the eye. An awful lot of Total Chances are on plays like cans of corn to the outfield and routine grounders to the infielders. Separating the wheat from the chaff is the first task of the data analysis, and there is an awful lot of chaff. That's IMO why it takes multiple seasons for defensive stats to take on the same meaningfulness of their offensive counterparts.

 

I concur with you... Yet small sample size single year defensive metrics based on a majority of "can of corn" plays are then tossed into the WAR calculations and from WAR... Win probablity, and everything else. 

 

And very few of us look at 3 year increased sample sizes. Most of us see 3.4  in RF and -2 in LF for 2018 and the assumptions are made that the player can't play LF. 

 

 

Link to comment
Share on other sites

You should always be wary. OPS is quick-and-dirty. WAR is quick-and-dirty. RBI and ERA and Wins all have extenuating factors.

 

Including defensive wins into WAR tells you something additional, even if it's staticky and sometimes even misleading. You teach your kids to watch their step on ice - you don't tell them not to walk on ice.

Link to comment
Share on other sites

You should always be wary. OPS is quick-and-dirty. WAR is quick-and-dirty. RBI and ERA and Wins all have extenuating factors.

 

Including defensive wins into WAR tells you something additional, even if it's staticky and sometimes even misleading. You teach your kids to watch their step on ice - you don't tell them not to walk on ice.

I tell old people to watch their step on ice.

 

With the Kids you hope they get a million dollar contract playing for the Vancouver Canucks. And if that’s not possible... you hope that they will hold the old guys arm as they navigate the slippery sidewalk.

Link to comment
Share on other sites

 

The difference is, while 10 PAs are certainly not predictive, nobody would argue those 10 PAs (whether the dude hit 1.000 or .000) weren't accurately measured. Those 10 PAs did happen, and we know, for certain, what the results were. We can use those 10 accurately measured PAs, in conjunction with another 10 and another 10, and so on, to accurately measure what happened, and perhaps form an educated opinion about what is likely to happen in the future.

 

Defensive metrics aren't sold that way. Everyone admits small sample sizes do not necessarily represent what actually happened. But those same people then turn around and claim "more inaccurate data" will solve the accuracy issue. 

 

I don't think it does. I think adding 10 accurately measured PAs to another 990 of the same gives you an accurate picture of what happened. But adding 10 inaccurately measured defensive plays to 990 other inaccurately measured defensive plays doesn't.

I think what you are claiming is more or less "garbage in, garbage out" which I addressed in my post.

 

I'm not sure how you think defensive metrics are calculated, but the foundation of them is stringers recording what actually happened. Is there some subjectivity involved? Sure -- I remember one issue was classifying fly balls vs liners (Statcast can help with that now). But that doesn't mean the data is worthless. There's subjectivity in scouting too but that data isn't worthless, and can gain some reliability if you get a large enough sample.

Link to comment
Share on other sites

 

I think what you are claiming is more or less "garbage in, garbage out" which I addressed in my post.

 

I'm not sure how you think defensive metrics are calculated, but the foundation of them is stringers recording what actually happened. Is there some subjectivity involved? Sure -- I remember one issue was classifying fly balls vs liners (Statcast can help with that now). But that doesn't mean the data is worthless. There's subjectivity in scouting too but that data isn't worthless, and can gain some reliability if you get a large enough sample.

 

I think my problem is that the reliability takes years to accumulate, and I'm not sure I trust the logic in that statement. It's not like there aren't tons of defensive plays available each year. 600 PAs for a batter ultimately leads to what... 400ish chances for a defender somewhere? Granted, there's 9 defenders on the field, but times the 9 batters in the lineup, you still have plenty of players with far more defensive activity then at bats. Yes, I know some have far less too, that's the nature of the game, but I'd think at this point they would be able to come up with something. It's quite literally 600 PAs against 400 or so defensive attempts per player on average. That's enough of a sample size to determine something.

 

My problem with all of this, to be honest, is the concept of luck that gets thrown into this way way way too much. When Buxton or Kepler posts a low BABIP, I don't think it's b/c it's simply that he's unlucky. What we view as luck is simply variation within the top 1/100th of 1% of people, and in that case, Buxton is performing like someone in the 1/50th of 1% of people. It can trick someone who doesn't recognize the skill required to perform in the range of what we could call random variation. It will take an adjustment for them to perform within that range. You see players saying this all the time, but we get lazy and call it luck. 

 

This was a long time ago, but I remember reading a paper on how easy it was to build design into a series of numbers and make it look random. I think, at least to some extent, we're dealing with much of the same problem in trying to quantify these types of things. We want to chalk up way too much to randomness that is skill because we cannot see the skill in the numbers due to the fact that we are dealing with the top .0001% of people on this planet who happen to possess this skill.

 

I'm not sure it's as simple as needing more years. I think it requires a better eye both physically and mentally.

Link to comment
Share on other sites

 

I think my problem is that the reliability takes years to accumulate, and I'm not sure I trust the logic in that statement. It's not like there aren't tons of defensive plays available each year. 600 PAs for a batter ultimately leads to what... 400ish chances for a defender somewhere? Granted, there's 9 defenders on the field, but times the 9 batters in the lineup, you still have plenty of players with far more defensive activity then at bats. Yes, I know some have far less too, that's the nature of the game, but I'd think at this point they would be able to come up with something. It's quite literally 600 PAs against 400 or so defensive attempts per player on average. That's enough of a sample size to determine something.

 

My problem with all of this, to be honest, is the concept of luck that gets thrown into this way way way too much. When Buxton or Kepler posts a low BABIP, I don't think it's b/c it's simply that he's unlucky. What we view as luck is simply variation within the top 1/100th of 1% of people, and in that case, Buxton is performing like someone in the 1/50th of 1% of people. It can trick someone who doesn't recognize the skill required to perform in the range of what we could call random variation. It will take an adjustment for them to perform within that range. You see players saying this all the time, but we get lazy and call it luck. 

 

This was a long time ago, but I remember reading a paper on how easy it was to build design into a series of numbers and make it look random. I think, at least to some extent, we're dealing with much of the same problem in trying to quantify these types of things. We want to chalk up way too much to randomness that is skill because we cannot see the skill in the numbers due to the fact that we are dealing with the top .0001% of people on this planet who happen to possess this skill.

 

I'm not sure it's as simple as needing more years. I think it requires a better eye both physically and mentally.

 

statcast is that better eye......

 

It's a boring argument, chief and others have decided they don't trust defensive stats, and no one is going to change his mind. I guess it's possible others will read this thread, and change their mind.....so it isn't worthless overall, but no one is changing chief's mind.

Link to comment
Share on other sites

statcast is that better eye......

 

It's a boring argument, chief and others have decided they don't trust defensive stats, and no one is going to change his mind. I guess it's possible others will read this thread, and change their mind.....so it isn't worthless overall, but no one is changing chief's mind.

Defensive metrics continue to improve. No doubt. I think all skeptics like cheif and myself say it's to acknowledge the blind spots. Many say they do, but don't really. Statcast is a game changer. It can track both the route of the defender, the trajectory and speed of the batted ball, and the place on the field it lands. The illusion is that it takes all into consideration. FSN will show Buxton on a route and catch and show it as caught only 6% of ther time. But that's not the route. It's simply based on how many balls of that trajectory are hit to that particular spot on the field and turned into outs. Useful, but with modern shifting the data can be heavily skewed. We've all seen the can of corn to the left side turned into a hot because of an extreme outfield shift. The left fielder would get crushed on that ball because statcast would show that ball as a FO 7 twingo spot 97% of the time when in reality he had no shot. Shifted the other way in the infield, a 2b might be called under that ball and statcast might have the play as nearly miraculous. Statcast tracks both but assembling is still difficult. SSS exacerbates this since one unlucky shift play can skew the numbers given the low number of chances. Then there's the arbitrary position adjustment for WAR which I contr want to get into again...

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
The Twins Daily Caretaker Fund
The Twins Daily Caretaker Fund

You all care about this site. The next step is caring for it. We’re asking you to caretake this site so it can remain the premier Twins community on the internet.

×
×
  • Create New...