A Bit More on WAR
I hope you saw Sean's response to my thoughts on the role that Baseball Reference WAR played in the Cy Young battle between Rick Porcello and Justin Verlander. I thought it was thoughtful, especially considering that he wrote it while on vacation. I want to discuss it in a bit more detail here.
At the beginning Sean points out:
1. That Baseball Reference very clearly states that the difference of 1-2 runs should not be considered definitive.
2. They break out each component so there's nothing whatsoever hidden.
These things are absolutely true and if I in any way suggested otherwise then I misspoke and take it back. I was not trying to suggest that B-R was openly lobbying for Justin Verlander because he had more WAR than Porcello (6.6 to 5.0) OR that I had uncovered some hidden secret when pointing out that the entire difference came down to the defensive differences of the Tigers and the Red Sox. I simply did the math. It's completely on me that I had not done it before.
What I think -- hope -- I was doing was making parallel points that the WAR difference between Verlander and Porcello DID, like it or not, have a real impact on the Cy Young race (and the reaction afterward) and that I seriously doubt most people did the math to figure out why Verlander had that edge.
And, once those points were made, I wanted to make the big one: I don't think using overall team defensive numbers to separate a pitcher's performance from his fielders the way WAR does is compelling or convincing. And I think, based on everything I can actually see in the Verlander-Porcello record, it was an incorrect adjustment in that specific case.
To very briefly recap, Baseball Reference WAR uses pitchers runs allowed and compares it -- after various adjustments -- to league average. Porcello and Verlander, after ballpark adjustments, saved almost exactly the same number of runs against the average pitcher. But, because the good folks at Baseball Info Solutions had the Boston Red Sox as an excellent defensive team (53 runs saved) and the Detroit Tigers as a terrible defensive team (minus-50 runs), WAR makes the assumption that much of Porcello's value actually belongs to his fielders while Verlander's numbers should ae adjusted significantly upward because he would have been better with even an average defense behind him.
That's the entire 1.6 WAR difference.
Now, I pointed out that everything I can find -- whether it's using old-fashioned stuff like errors and base-stealers thrown out and unearned runs or newer details like batting average on line-drives -- suggest that the Tigers defense behind Verlander was BETTER than the Red Sox defense behind Porcello. I don't know if that's true (though I think with the incredible progress being made with Statcast we will soon know a lot more). But I do know that when you break it down batter by batter there seems no possible way that Verlander's defense was that much worse than Porcello's.
And so here then is the part of Sean's response I want to talk about:
Maybe it’s true that the Tigers were above average fielders when Verlander was on the mound, maybe not, but keep in mind BIS had the Sox at +59 and the Tigers at -49 for the season. How on earth does a team that’s the 3rd worst defensive team transform itself into an above average defensive team for the 228 innings JV was on the mound given they’d then have to be EVEN worse the other 1200 innings to get to -49?
If we assume for the moment the Tigers were in fact “excellent” behind Verlander, then the question becomes how to handle this. We apply the team’s DRS to each pitcher based on the percentage of the team’s balls in play which in probably 95% of the cases is a good way to do this and may still be in Verlander’s case. If you start to dice things up by the pitcher on the mound you then run into very small samples where something like Mookie Betts pulling back a home run and getting a double play has a dramatic impact on the pitcher’s WAR. I don’t think you’d like the alternative as you run the risk of conflating random variance with real performance differences.
Sean's numbers are slightly different for Baseball Info Solutions than the ones I have, but that's OK. His point does seem irresistible; I do sound a bit crazy to think that the Tigers were excellent defensively when Verlander was on the mound when they were so clearly inferior the rest of the time.
So ... let's do something else. Let's compare two other pitchers: Toronto's Marco Estrada and Tampa Bay's Jake Odorizzi. But instead of comparing their WAR, I want to compare their won-loss records. Yes, it's true, I don't like won-loss record, but I'm trying to make a different point. I hope it will make sense as we go.
Estrada had a 9-9 record.
Odorizzi had a 10-6 record.
Now, we all know enough about the quirks of baseball to know that won-loss record is altered by all sorts of things -- the timing of runs scored, the effectiveness of the bullpen, etc. So let's say that we wanted to figure out what their won-loss record SHOULD HAVE BEEN. There are various ways of doing this that are well-beyond my mathematical means, but for our purposes let's to use the Baseball Reference WAR to make an estimation.
We start with runs allowed:
Estrada gave up 73 runs in 176 innings, for a 3.73 runs per nine innings.
Odorizzi gave up 80 runs in 187 2/3 innings for a 3.84 runs per nine innings.
Estrada was ever so slightly better at preventing runs. Hiss advantage goes up a touch more because he faced slightly better competition. It goes up a little bit more because of ballpark factors; Toronto's ballpark is better for hitters than Tampa Bay's.
Do you have the image in your mind? Now, Baseball Reference makes the defensive adjustment. The Blue Jays defense (52 runs saved) was better than Tampa Bay's defense (22 runs saved) but they were both better than average and so both pitchers have their numbers adjusted downward, with Estrada's going down a little bit more.
Got it? In the end, Baseball Reference has Estrada saving 17 runs against average vs. Odorizzi's 12 runs above average. That makes Estrada's WAR 3.4 and Odorizzi's 3.0, so fairly close, slight edge to Estrada ... BUT remember in this case we're not looking for their WAR. We're trying to look at won-loss records.
And in order to look at won-loss record, well, yes, we have one more thing to look at: Run support.
I'm guessing you can see where I'm going with this.
If we mirror Baseball Reference's defensive system, we should figure out run support by looking at how many runs their teams scored over the season.
Toronto scored 759 runs, which is 28 runs above league average.
Tampa Bay scored 672 runs, which is 59 runs below league average.
And so, it's obvious that Toronto's offense -- to use my own quote -- is much, much, much better than Tampa Bay's offense. From this, then, we have to assume that Toronto gave Estrada much better run support than Odorizzi. Right? I mean, how on earth would the second-worst offense in the league transform itself into an offensive machine for the 187 innings that Jake Odorizzi is on the mound? How on earth could the powerful Blue Jays offense go into the tank for the 176 innings that Marco Estrada pitching?
Like I say, I think you know where this is going.
The Rays averaged 5.18 runs per nine innings when Odorizzi was on the mound.
The Blue Jays averaged 3.63 runs per nine when Estrada was on the mound.
I am one of Baseball Reference's biggest fans, obviously, and so what I'm writing here is not intended as a criticism but as a friendly suggestion. I don't think using team defensive adjustments for individual pitchers work in B-R WAR. From the perspective of the lamest of laymen, I just don't think they're persuasive. Even if Verlander-Porcello is an anomaly the way that Estrada-Odorizzi is an anomaly, I don't think there's any way that pitchers get exactly the same level of defense behind them. That seems utterly obvious to me. Look at the Red Sox run support:
Rick Porcello (33 starts), 7.6 runs per nine.
David Price (35 starts)(, 6.6 runs per nine.
Steven Wright (24 starts), 6.4 runs per nine
Clay Bucholz (21 starts), 4.6 runs per nine
Drew Pomeranz (13 starts), 3.9 runs per nine
Eduardo Rodriguez (20 starts), 3.6 runs per nine
Why would offensive run support fluctuate so much and defense NOT fluctuate? This is especially true because different pitchers allow different sorts of balls in play -- ground balls, fly balls, line drives, choppers, bloopers, bat-breaking balls, you name it. I am of the belief that pitchers do not have very much control of whether balls put in play become hits, but I think it's obvious that they do have clear tendencies. The strength of the Red Sox defense was in the outfield where their right and centerfielders saved 44 runs. This, theoretically, should help a fly ball pitcher like Eduardo Rodriguez more than a ground ball pitcher like Pomeranz.
Theoretically, anyway.
I don't believe that Baseball Reference breaks down ground balls or fly balls or where balls were hit -- I think they just use total runs saved by a defense and apply that generally to balls in play. It seems clumsy for an elegant formula.
Sean wrote that if they find an issue with defensive numbers they will look at it because they're always looking to improve. I believe that's true. This is a friendly suggestion. Sean and company are a lot smarter than I am and if they find little to no merit in what I'm saying, I will not take any offense. My kids don't think I know what I'm talking about either.