Some years ago, I wrote about how I’d love to be the Jimmy Carter peace broker on WAR. As you know, two of our favorite baseball stat sites — Baseball-Reference and FanGraphs — use “Wins Above Replacement” as their one-stop-shopping statistic. But the two WARs are different in a variety of ways. It seemed to me then to be confusing and unhelpful: If you’re going to have a statistic called WAR then shouldn’t it be one thing? I imagined being in the middle of peace summits and statistical negotiations.
I’m not saying I was expecting to win the Nobel Peace Prize. I’m just saying I expected to be nominated.
Several people I really respect and admire, though, corrected me on my whole mindset … they explained that WAR is not a STATISTIC so much as it is a FRAMEWORK for figuring out a player’s value. My good friend Tom Tango, who invented the concept, is very clear in saying that we can all have our own personal WAR formula by focusing on those things we value. And so, the wrong thing to do is try to stifle creativity and innovation and dissent by making WAR just one thing for everybody.
And I get that. I really do.
But I’m really troubled by the WAR war over Milwaukee ace Corbin Burnes.
Burnes is having a staggering season. He is, as I write this, 10-4, if you need a won-loss record, and he has a 2.25 ERA (second to Max Scherzer), a 0.914 WHIP (second to Scherzer), a ridiculous 210-29 strikeout-to-walk ratio (best), a 188 ERA+ (best), he’s allowed five home runs all year (best), and his FIP is 1.50 (best). He’s trying to become the first pitcher since Jim Whitney in 1883 to lead the league in strikeouts per nine innings AND fewest walks per nine innings. It’s not just a great year but a historic one. It’s only 152 innings so far, but few have ever pitched better.
Here’s how the two WARs see him:
FanGraphs: 7.1 Wins Above Replacement (best in baseball)
Baseball-Reference: 5.3 WAR (7th-best in baseball)
It’s well established how much I love Baseball-Reference and the people there, but I simply cannot make sense of the Baseball-Reference WAR total. It seems completely mad to me. Baseball-Reference has Burnes behind, among others, Cincinnati’s Wade Miley, who’s having a fine season but whose numbers don’t even seem to be close.
So, I dug into Burnes’ numbers just a little bit. And it seems clear to me that Baseball-Reference has a really nasty flaw with its pitching WAR formula.
Wait, I should probably start with 2018 Aaron Nola.
You might remember that I dove into this three years ago after BR calculated that Nola — who had a fine season — was worth 8.8 wins above average, making it the fifth-best pitching season of the last 100 years.*
*Nola’s WAA has been adjusted since then, and it’s now the 10th-best season of the last 100 years, which still seems pretty much out of whack.
Nola’s season was unquestionably good: 17-6, 2.37 ERA, 224-58 strikeout-to-walk, third in the Cy Young voting. But it seemed clear to me that it wasn’t THAT good. Why did BR think it was THAT GOOD?
The answer had to do with adjustments that BR made to try and get at his true value.
Baseball-Reference WAR (or bWAR) starts simply with runs allowed by the pitcher. That’s the main difference between BR and FanGraphs. BR builds its system off runs allowed while FanGraphs builds its system around the three pitching categories that pitchers have the most control over — strikeouts, walks and home runs allowed.
This is just a basic philosophical difference. BR starts with the thesis that a pitcher’s job is to prevent runs from scoring, which is basically the way almost everybody has looked at pitching for more than a century. FanGraphs works off the idea that a pitcher can only control those three things and so to find the pitcher’s true value you should focus on those.
Like I say, that’s a philosophical argument, and I can see the good and bad on both sides. We can fight that fight another day.
But here’s the thing about bWAR: It doesn’t just look at runs allowed and stop there. A serious look at pitchers cannot stop there. No, there have to be adjustments. For example, the pitcher’s ballpark has to be considered; some pitchers pitch home games in great pitchers’ parks like Los Angeles, and some pitch in great hitters parks like Colorado. You can’t treat those pitchers the same.
But the even bigger adjustment is team defense. Any effort to find out how good a pitcher is will try to separate team defense from the pitcher’s performance. This is what ERA is all about. FanGraphs handles this, as you see, by simply avoiding balls hit in play.
But Baseball-Reference uses “Defensive Runs Saved” or DRS to determine how good or bad a team’s defense is. And then they adjust the pitcher’s value accordingly. If a pitcher is backed up with a fantastic defense, the assumption is that some of the pitcher’s value has to go to them. And if the pitcher is weighed down by a lousy defense, then the assumption is that the pitcher’s value should be adjusted upward.
In theory, maybe it works.
In practice, though, I think it can lead to massive miscalculations.
And I think Aaron Nola’s 2018 season is a massive miscalculation. When I wrote that Nola analysis, I used Bob Gibson’s legendary 1968 season as a comparison for Nola.
Before adjustments, Gibson was 64 runs better than the average pitcher and Nola was 50 runs better than the average pitcher. That difference of 14 runs is pretty big, as you would expect since Gibson had a 1.12 ERA and Nola’s was more than a run higher.
But there had to be adjustments made for the time when the pitcher pitched and the ballpark, and that did tighten things up considerably.
Then came the defensive adjustment. The Defensive Runs Saved system determined that the 2018 Phillies were an abomination defensively. And because they were so bad defensively, BR WAR added FIFTEEN runs to Nola’s total.
It’s a staggering number, completely wiping out the huge run differential between the pitchers. Is that really possible? I mean, the Phillies basically would have had to be the early Bad News Bears to merit that kind of adjustment.
But here is where it gets REALLY crazy; I don’t doubt that the Phillies defense was bad overall, which is what DRS showed, but it seems pretty clear looking at other numbers that they were actually quite good when Nola was pitching. He gave up just one unearned run all year. His batting average on balls in play — the balls the defense can turn into outs — was .254, the LOWEST among all pitchers with at least 200 innings.
Again, Nola was obviously very good in 2018. But because of the bWAR system, Nola received huge benefits from both sides — he got both good defense and the extra credit for presumed bad defense. This is why I think his Wins Above Average and his Wins Above Replacement are absurdly inflated.
And now, finally, we get to Corbin Burnes, who I think has the same story but from the other side of the funhouse mirror.
Probably the best way to explain the Burnes conundrum is to compare him with Cincinnati’s Wade Miley.
Start with runs allowed, the core of bWAR.
Burnes: 2.37 runs per nine innings
Miley: 3.26 runs per nine innings
So that’s a pretty wide gap, almost a run per game. You would think based solely on runs allowed that Burnes had a much higher WAR than Miley.
But then come the adjustments: bWAR shows Miley pitches in a better hitters’ park. Boom. It shows Miley has faced tougher lineups. Boom. The adjustments come in and the gap between them closes.
And then the big one: Defense. According to DRS, Burnes has benefited from excellent defense. DRS has Milwaukee as the fourth-best defensive team in baseball, in large part because of its fabulous outfield defense led by Jackie Bradley Jr., Avisail Garcia and that old standby Lorenzo Cain.
Miley, meanwhile, has been burdened by subpar defense.
And when you add it up, the difference is quite massive — .54 runs per nine innings if you’re scoring at home — and the bWAR numbers end up like this:
Miley: 5.7 WAR
Burnes: 5.3 WAR
I think this is a gigantic misreading of the two pitchers. Miley is having a fine year, but in my view, it’s not close to what Burnes is doing. I think there are two fatal flaws here. One is I think the adjustment is way too big. I’m certainly not smart enough to argue the math, but it seems to me that there’s just too much certainty in DRS and ballpark adjustment and the rest.
As Tom Tango says, “Personally, I would make each adjustment half as much to handle uncertainty in each measurement.”
Second, though, is using DRS team numbers to adjust individual pitchers’ WAR. Again, I don’t doubt that Milwaukee’s defense overall is better than Cincinnati’s. But is that true when Burnes or Miley is pitching? At Baseball Savant, they break down defense differently, going play by play with the Statcast™ system.
And by Outs Above Average (OAA), yes, they do have the Reds defense as the worst in the National League (minus-25 outs). But they also have the Brewers as a below-average defense.
And more to the point, they break down the defenses behind individual pitchers. Behind Burnes, the Brewers’ defense is minus-4 OAA. Burnes’ expected ERA of 1.90 is even better than his actual ERA of 2.25. In other words, Statcast™ is saying that Burnes’ defense has HURT him a little bit this year.
So, like bizarro Nola, Burnes is also getting it from both sides — getting OK but not brilliant defense behind him while also getting dinged by bWAR, as if he’s pitching in front of the 1970s Baltimore Orioles defense.*
*By the way, do you know who IS playing in front of the 1970s Baltimore Orioles defense? Adam Wainwright. DRS does show the Cardinals being a terrific defensive team, but even that underrates how good they’ve been behind Wainwright; the Cards are 22 Outs Above Average when Waino is on the mound. That is more than double anyone else in baseball.
And it’s clear to me now that FanGraphs just has a clearer vision for how to judge pitchers. Yes, there are limitations to looking only at strikeouts, walks and home runs … but I’d say that looking at those does give a truer picture of Burnes vs. Miley.
Burnes: 210 Ks, 29 walks, 5 HR, 7.1 fWAR.
Miley: 123 Ks, 49 walks, 14 HR, 3.2 fWAR.
That seems much more accurate to me. I still think that both the BR philosophy and the FanGraphs philosophy are sensible ways to judge pitcher performance. But I think, right now, FanGraphs is doing a better job of executing their vision.