Jump to content
Twins Daily
  • Create Account

How Our Eyes Lie To Us


Matt Braun

Recommended Posts

Twins Daily Contributor

There exists a problem in baseball. A problem that has perplexed front office executives and decisions makers for years. A problem that has caused an incalculable number of clashes between people. A problem that will exist until we are eventually replaced by cyborgs. The problem? Our eyes lie to us.It’s not our fault necessarily, we have no control over what we see. We are helpless to the fact that we can only observe what occurs on the field and make judgements from that information. Algorithms do not conveniently exist inside our head. We also always adhere to the fact that we have been indoctrinated to the things that matter the most in baseball-runs. And when these runs occur (or do not occur), then we have no choice but to react accordingly by using RBIs and ERA. Unfortunately, our eyes lie.

 

The faultiness of our eyes is the same reason why Billy Beane famously clashed with his scouts and ran an entire draft class based on stats alone. Beane was tired of scouts presenting bias with information that had to be entirely free of the human emotion. The only solution? Numbers. They don’t lie. Numbers don’t have eyes and are not subjected to the same limits of our human emotions. So Beane did what he had to do and built an entire system of baseball on numbers. Numbers that were actually crucial to winning baseball games.

 

Where am I going with this? I want you to answer this simple question; who is having a better season? Taylor Rogers or Tyler Duffey? Go on, I can wait.

 

The vast majority of you likely chose Duffey and you probably did so without thinking too hard about it. It is a solid choice after all. Duffey has just a 1.88 ERA as of the 1st of September and has looked dominant since re-joining the team last season (playoffs not included). The streak of greatness is believable as well as Duffey has shed any semblance of his previous self and has embraced change to become the pitcher, destroyer of hitters.

 

Rogers, on the other hand, has been quite mortal this year. He has an ERA of 4.38 on the year as he has not quite looked to be his old self. Matthew Trueblood just wrote an article the other day on this topic. Go read it.

 

However, this is a murkier question than you may think. Let’s go to a future where the tyranny of ERA is no more. What do other important stats say? Here’s a comparison of “Player A” and “Player B”:

 

Player A: .282 xwOBA, 2.01 FIP, 3.08 xERA

Player B: .288 xwOBA, 3.64 FIP, 3.22 xERA

 

Think about it for a few seconds. Now who do you want? Is it a tough decision?

 

I’ll spare you the suffering. Player A is Taylor Rogers and Player B is Tyler Duffey.

Surprised? The advanced stats actually favor Rogers so far. Granted, it isn't by much, but the fact that they’re even this close might come as a shock to many. Duffey has just seemed so dominant to begin the season. Narratives are powerful things.

 

What’s happening is Rogers is getting bit by the bad luck bug. His BABIP is .436 (career .309) and his LOB% is 51.1% (career 77.6%). Glancing at his Fangraphs page and ignoring ERA would lead to you to believe that Rogers is having a perfectly normal season by his standards.

 

This doesn’t mean that everything is all sunshine and roses for Rogers. Trueblood noted that Rogers has real mechanical differences this year that have led to his struggles. He has some small changes to make. But these struggles may still be overblown. Some of his important numbers are stable and, well, it has just been one single month.

 

Knowing all of this, it’s important to realize that your eyes can lie to you. What you see is not necessarily what is really occurring. Baseball is an especially tricky sport that likes to tell small fibs in the form of previously agreed upon stats. Stats that before made up the foundation of how we understood the game. But we have new stats now. And these stats can prove that when you’re watching the game, you’re not actually looking.

 

MORE FROM TWINS DAILY

— Latest Twins coverage from our writers

— Recent Twins discussion in our forums

— Follow Twins Daily via Twitter, Facebook or email

 

Click here to view the article

Link to comment
Share on other sites

Twins Daily Contributor

One of my Twitter followers suggested that I added links to explanations of the advanced stats I used so here they are:

 

FIP:

 

https://library.fangraphs.com/pitching/fip/

 

xwOBA:

 

http://m.mlb.com/glossary/statcast/expected-woba

 

xERA:

 

http://redscontentplus.com/2020/03/xera-fixing-era-kinda-sorta/

Link to comment
Share on other sites

Provisional Member

Actually, as any statistician will tell you, numbers do lie.   Frequently.   Not because they aren't factual, but because people use them to predict things that they are not capable of predicting.   For example, the goal of a baseball team is to win as many games as it can  (ignoring teams that intentionally lose for other reasons.)   So statistics that have a high correlation coefficiency with winning the most games are the most useful.  The problem is we don't have many (any?) of those.   So we are forced to take statistics that correlate to something that is useful to winning games, and try to combine them somehow, hoping we don't violate statistical basics such as independence of variables in doing so.   And then we use judgment to weight the various statistics--and by using judgment we are now out of the purely statistical realm.

 

Another example of statistical failure is in looking at results of (perhaps now defunct) loogys.   We've all seen cases of loogys having much better stats than other middle relievers, and often better than closers.   Does that mean that loogys are better pitchers?   Of course not.   They are simply given a much more limited role where they have a higher likelihood of success.   But the statistics don't really measure that--at least not the statistics I know.   More broadly, if you consider all relief pitchers, it can be very difficult to do a statistical evaluation because they haven't all had the same opportunities/played the same roles.   To do an accurate statistical comparison of closers vs non-closers you'd have to have a quantification of how much harder hitters bear down in the last at-bat and the effect of pinch-hitting or the threat of pinch hitting.   I don't think that exists.   Without that, any precise statistical comparison of effectiveness is impossible.  T his is not to say statistical measures aren't useful.   Of course they are.   But their limitations need to be understood, and the way you try to compensate for those limitation is through eye tests.

 

In the end, though, I'm not sure that's relevant to your article.   Perhaps I didn't read it well, but it seemed to me that you didn't offer support for your claim that the eye test didn't work, you merely compared a common statistical measure--ERA, with what have been called more advanced measures.   I'm not convinced though, that those advanced measures are more likely to predict wins than the basic measures.   In fairness, I'm not convinced the other way either.   However neither of those, at least in my use of the term, would constitute an eye test.   An eye test is what a scout would say after watching the two pitchers.    And those thoughts would be an interesting contrast to the numerical evaluations we call statistics.   I don't know, but I wouldn't be surprised to hear that the scouts' eye tests supported a contention that Rogers has been the better pitcher.

Link to comment
Share on other sites

 

Actually, as any statistician will tell you, numbers do lie.   Frequently.   Not because they aren't factual, but because people use them to predict things that they are not capable of predicting.   For example, the goal of a baseball team is to win as many games as it can  (ignoring teams that intentionally lose for other reasons.)   So statistics that have a high correlation coefficiency with winning the most games are the most useful.  The problem is we don't have many (any?) of those.   So we are forced to take statistics that correlate to something that is useful to winning games, and try to combine them somehow, hoping we don't violate statistical basics such as independence of variables in doing so.   And then we use judgment to weight the various statistics--and by using judgment we are now out of the purely statistical realm.

 

Another example of statistical failure is in looking at results of (perhaps now defunct) loogys.   We've all seen cases of loogys having much better stats than other middle relievers, and often better than closers.   Does that mean that loogys are better pitchers?   Of course not.   They are simply given a much more limited role where they have a higher likelihood of success.   But the statistics don't really measure that--at least not the statistics I know.   More broadly, if you consider all relief pitchers, it can be very difficult to do a statistical evaluation because they haven't all had the same opportunities/played the same roles.   To do an accurate statistical comparison of closers vs non-closers you'd have to have a quantification of how much harder hitters bear down in the last at-bat and the effect of pinch-hitting or the threat of pinch hitting.   I don't think that exists.   Without that, any precise statistical comparison of effectiveness is impossible.  T his is not to say statistical measures aren't useful.   Of course they are.   But their limitations need to be understood, and the way you try to compensate for those limitation is through eye tests.

 

In the end, though, I'm not sure that's relevant to your article.   Perhaps I didn't read it well, but it seemed to me that you didn't offer support for your claim that the eye test didn't work, you merely compared a common statistical measure--ERA, with what have been called more advanced measures.   I'm not convinced though, that those advanced measures are more likely to predict wins than the basic measures.   In fairness, I'm not convinced the other way either.   However neither of those, at least in my use of the term, would constitute an eye test.   An eye test is what a scout would say after watching the two pitchers.    And those thoughts would be an interesting contrast to the numerical evaluations we call statistics.   I don't know, but I wouldn't be surprised to hear that the scouts' eye tests supported a contention that Rogers has been the better pitcher.

excellent - I really appreciated your thorough comments

 

Link to comment
Share on other sites

Yes statistics lie.  Numbers are chosen by the individual to prove a point and that creates a bias.  True we cannot look at all the numbers - there are too many and frankly I still like AV, OPS, SLG, R, and RBIs.  

 

Churchill said - Churchill’s: “There are lies, there are damned lies, and then there are statistics”. The implication was that statistics and the manipulative way they were presented were the biggest lies of them all.

 

Good article to stir up conversation on a day with no game to analyze.  But I will take Duffey. 

Link to comment
Share on other sites

 

Actually, as any statistician will tell you, numbers do lie.   Frequently.   Not because they aren't factual, but because people use them to predict things that they are not capable of predicting.   For example, the goal of a baseball team is to win as many games as it can  (ignoring teams that intentionally lose for other reasons.)   So statistics that have a high correlation coefficiency with winning the most games are the most useful.  The problem is we don't have many (any?) of those.   So we are forced to take statistics that correlate to something that is useful to winning games, and try to combine them somehow, hoping we don't violate statistical basics such as independence of variables in doing so.   And then we use judgment to weight the various statistics--and by using judgment we are now out of the purely statistical realm.

 

Another example of statistical failure is in looking at results of (perhaps now defunct) loogys.   We've all seen cases of loogys having much better stats than other middle relievers, and often better than closers.   Does that mean that loogys are better pitchers?   Of course not.   They are simply given a much more limited role where they have a higher likelihood of success.   But the statistics don't really measure that--at least not the statistics I know.   More broadly, if you consider all relief pitchers, it can be very difficult to do a statistical evaluation because they haven't all had the same opportunities/played the same roles.   To do an accurate statistical comparison of closers vs non-closers you'd have to have a quantification of how much harder hitters bear down in the last at-bat and the effect of pinch-hitting or the threat of pinch hitting.   I don't think that exists.   Without that, any precise statistical comparison of effectiveness is impossible.  T his is not to say statistical measures aren't useful.   Of course they are.   But their limitations need to be understood, and the way you try to compensate for those limitation is through eye tests.

 

In the end, though, I'm not sure that's relevant to your article.   Perhaps I didn't read it well, but it seemed to me that you didn't offer support for your claim that the eye test didn't work, you merely compared a common statistical measure--ERA, with what have been called more advanced measures.   I'm not convinced though, that those advanced measures are more likely to predict wins than the basic measures.   In fairness, I'm not convinced the other way either.   However neither of those, at least in my use of the term, would constitute an eye test.   An eye test is what a scout would say after watching the two pitchers.    And those thoughts would be an interesting contrast to the numerical evaluations we call statistics.   I don't know, but I wouldn't be surprised to hear that the scouts' eye tests supported a contention that Rogers has been the better pitcher.

Loved this comment.  I have made very similar comments years ago when Twins were winning but their run differential was poor.  Everyone said expect the Twins to start losing because their winning percentage did not match run differential.  However, those are predictive stats, they are states controlled by results.  I agreed it was unlikely that both would continue and it was true, but what change was the run differential not the winning percentage.  

 

I will agree that looking at stats can help confirm or refute what your eyes are seeing, but they do not predict future outcomes.  The writer of article points out that Rodgers should be having better results, but he has not.  It asks who is having better season.  Duffy clearly is, the results show that.  If the question is who is pitching better overall, maybe it is closer answer, but Duffy has not given up a bunch of runs, Rodgers has.  Does it mean Rodgers will continue to have the bloops fall, no he could start having those be outs, and Duffy could run into bad luck.  At some point you need to just see results and agree they are what happened good or bad luck.

Link to comment
Share on other sites

I don't quite get this. All the statistical analysis to determine that Rogers has "bad luck." What's the difference between being bad at baseball and being unlucky? To my eyes, they both look like losing. I understand that the advanced statistics can show potential, but my eyes watched Rogers last year and recognized the potential. I just don't get taking solace in the fact that we are losing games because of "luck" rather than being bad, they both look the same in December. 

 

I'm not against advanced stats, at all. It's how I discuss the game with friends. I just find it odd that we point out all these numbers as a way to disprove what we think we are seeing. Then chalk it all up to a nebulous thing called luck. 

Link to comment
Share on other sites

The stat I want, though I won't phrase it precisely enough, is percentage of the time the pitcher hits the target.

 

I just looked through some of the bad games Taylor had in 2020, and while "luck" may be a culprit, a lot of the hits were occurring when the ball went somewhere different than the catcher set up for. Now, that form of analysis is the classic statistical mistake, looking only at a biased sample. And I will also acknowledge, after looking at some 2019 footage, that some of Taylor's strikeouts occur where the catcher is fooled too.

 

So I want a complete picture. The technology has to be there - they draw that little box on the screen, and presumably software could locate the catcher's mitt as well, and automate the whole process.

 

How do you define "hitting the target" so it's foolproof and can't mislead you? No idea. And what kinds of "missing the target" are OK? Surely a center-cut fastball when the catcher sets up... well, probably literally anywhere else... is different than just a wild toss that gets away from you high and outside. Details, schmetails.

 

Anyway, I can ask. :)

Link to comment
Share on other sites

I don't quite get this. All the statistical analysis to determine that Rogers has "bad luck." What's the difference between being bad at baseball and being unlucky? To my eyes, they both look like losing. I understand that the advanced statistics can show potential, but my eyes watched Rogers last year and recognized the potential. I just don't get taking solace in the fact that we are losing games because of "luck" rather than being bad, they both look the same in December. 

 

I'm not against advanced stats, at all. It's how I discuss the game with friends. I just find it odd that we point out all these numbers as a way to disprove what we think we are seeing. Then chalk it all up to a nebulous thing called luck. 

Advanced stats like we're talking about aren't tremendously useful when looking through the rearview mirror only. I don't want a Cy Young award based on FIP.

 

But advanced stats may help you determine whether a guy's playing stats are likely to bounce back. "Luck" is not a very satisfying term when dealing with human beings trying their absolute best against other human beings trying to defeat those efforts. I lean toward "unreproducible", when it's something that experience seems to show evens out over time. Batting Average on Balls in Play is noted for coming in just under .300 for most batters and pitchers who are good enough in the first place to reach the Show, and is an example of a guideline to help tell when someone is about to bust out of their slump or come down from the stratosphere. If a pitcher's ERA is high and the BABIP isn't, then what you're seeing might be simply who he is.

 

As Matt Braun says, Taylor Rogers is currently sporting a .436 BABIP, "leading" the team. Unlucky, unreproducible, whatever we call it, he might indeed be pitching better than the results are showing.

 

Duffey is at .161, also leading the team, but at the other end of the scale. Good for him; I hope he keeps it up (and thereby makes history of some obscure sort - nobody who's faced 200 batters in a season has done it).

 

Link to comment
Share on other sites

Baseball is virtually run by numbers and statistics. And we have so many of them that it can not only be confusing, but you can select which statistics you feel are worthy or support your arguement. So it is virtually impossible to fully argue, predict or agree.

 

Now, the flip side of this...over time and not just a SSS...is that a pitcher or hitter can hover around a number like BABIP and see an abberation. We can then conclude that a cold spell SHOULD normalize and that player will rise or fall based on previous performance. We can dismiss BA for a hitter or ERA for a pitcher and say new metrics make them absolute. Maybe, maybe not. But if a hitter has a career BA of X, and a pitcher has a career ERA of X, but a season or 2, or even a bad month or 2, then that outlier means something doesn't it.

 

IIRC, I heard Rogers's SO numbers were actually up this season. That's a good thing and a good number to measure. But his BABIP, as mentioned, is up above the norms he has produced. So you can therefore state he's good, even better in some measurables, but unlucky, or missing spots and having a bad year.

 

I have always found that there is a "truth" that can't always be quantified in regard to statistics. Call it life, the human factor, karma, whatever. Things tend to balance out and reality is always somewhere in the middle of things.

 

And let's be honest here, ANY statistical information for baseball this season will be at least somewhat inaccurate as it's based on a 60 game season as opposed to the normal 162. A couple bad weeks or an injury destroys your 2020 numbers. In a full season, it can end up being a blip.

Link to comment
Share on other sites

Baseball is played by people not numbers. Sometimes the numbers suffer because the people are put into situations they are not suited for. My eyes tell me that when Glenn Perkins was put into a non save game situation he didn’t seem as sharp as when in a save one. Lack of focus? Hard to say.  Numbers tell us that Rogers doesn’t pitch as well on back to back days. Yet seemingly Rocco has used him in that context more often, which will not help Rogers. Now it would be great to have a rubber armed every day "closer", but trying to make a player something he isn’t generally doesn’t work. So in the Rogers situation is it him who is having a poorer year? Or is it the guy in the dugout sending him to the mound in situations he doesn’t seem compatible with? 

Link to comment
Share on other sites

Provisional Member

 

The stat I want, though I won't phrase it precisely enough, is percentage of the time the pitcher hits the target.

 

I just looked through some of the bad games Taylor had in 2020, and while "luck" may be a culprit, a lot of the hits were occurring when the ball went somewhere different than the catcher set up for. Now, that form of analysis is the classic statistical mistake, looking only at a biased sample. And I will also acknowledge, after looking at some 2019 footage, that some of Taylor's strikeouts occur where the catcher is fooled too.

 

So I want a complete picture. The technology has to be there - they draw that little box on the screen, and presumably software could locate the catcher's mitt as well, and automate the whole process.

 

How do you define "hitting the target" so it's foolproof and can't mislead you? No idea. And what kinds of "missing the target" are OK? Surely a center-cut fastball when the catcher sets up... well, probably literally anywhere else... is different than just a wild toss that gets away from you high and outside. Details, schmetails.

 

Anyway, I can ask. :)

So let's say you have a pitcher who hits the target 100% of the time--when the ball makes it past the batter.  Unfortunately, Batters are hitting .400 against him.  And a second pitcher who hits the target 50% of the time, but batters are hitting .100 against him--what would be called "effectively wild".  The second pitcher is probably more effective.   Obviously it's a bit of a silly example; my only point is that command, which is measured by hitting the target, isn't the same as effectiveness.  However I do believe, as I suspect you do too, that the two are pretty well correlated, despite my silly example.

Link to comment
Share on other sites

Provisional Member

 

A truer comparison of worth to the Twins would be game-by-game outcomes. How often was each each successful at their intended jobs?

 

If "outcomes"  means win or loss, even that isn't always a good measure.  The absolute "right" measure is your second sentence.  For example, consider the guy who comes in for the bottom of the ninth with his team up by 8 runs and trying to get to the airport.  His job is to get his team off the field as fast as possible (without coming close to losing the lead.)  Nibbling at the corners, giving up a hit, and walking 2 guys but striking out 3 may look good in stats, but even though that's a win, that's an inferior performance to the guy that attacked hitters, gave up two solo home runs, and got his team off the field in 12 pitches with 3 ground outs.

 

The problem is it's hard to statistically quantify intended performance.  It takes eye tests to do that.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
The Twins Daily Caretaker Fund
The Twins Daily Caretaker Fund

You all care about this site. The next step is caring for it. We’re asking you to caretake this site so it can remain the premier Twins community on the internet.

×
×
  • Create New...