Many years ago, around the time that Bruce Springsteen was writing and recording Born to Run (spoiler alert), Bill James invented a simple but beautiful little formula to help determine how runs are created in baseball. He intuited from all his baseball reading and research that a run-scoring formula would need three components:
A: Getting on base.
B: Advancing on the bases.
Makes sense, right? To score a run, you have to get to the plate. You have to get on base. You have to work around the bases. So, he came up with an equation that I think is sort of baseball's E=MC2. In its most basic form, it looks like this:
[(H+W) * (TB)] / (AB + W)
That’s it. That’s the whole formula.
[(Hits plus Walks) times (Total Bases) / (At-bats plus walks)]
There's so much goodness in this equation. For one thing, as you can tell by the plainness of it, Bill was working with what little he had. It is sort of like early human artwork done with rocks. This was in a time long before Baseball Reference and Fangraphs. Bill in those days was copying statistics out of The Sporting News and off the back of baseballs cards. In those days, it wasn't even easy to get hit-by-pitch numbers. Caught-stealing numbers were guarded like the crown jewels. To get total bases, you would have to do some backward math, figuring out how many singles a hitter had by subtracting doubles, triples and home runs from the hit total. You had to work with what you were given.
And that was OK because even if you had all these other statistics, you couldn't do much with them. There were no computers then to do complex equations. Bill was doing everything by hand. He came up with a simple formula because, well, he needed a simple formula.
But here's the thing: The formula worked. I suspect it worked beyond Bill's wildest hopes. Many people -- Bill included -- have tinkered with runs created, bent it, twisted it, weighed it down, lifted it up, stretched it and stepped on it. Some of these are great statistics. But just this original formula, which seems to leave so much out, still gives an astonishing estimate of how many runs will be scored. It is, in a word, magical.
How magical? Well, if you go back to 1950, the basic runs created formula has estimated that 1,126,591 runs would be scored. And, over those 65 years, teams have actually scored just 3,695 more runs than that. Think about that for a moment. Bill will tell you: He's no mathematician. He's no scientist. He's not statistician. But through instinct, he came up with a basic formula that is 99.5% accurate.
If you go year by year, the basic runs created formula is a bit more volatile. It has been as much as 3% off, which is still not bad. In other years, though it has been so accurate as to seem mystical. In 1994, for instance, the formula estimated that 15,753 runs would be scored. How many runs were scored? Well: 15,752 -- one run less. There have been many years like that.
Why do I bring this up now? Well, I think it's time we celebrate the runs created formula again. In many ways, it has faded into the background as more complicated and more thorough formulas have taken its place. But there's something arresting and magnificent about runs created -- as I found out when trying to do a little experiment.
The experiment had nothing at all to do with runs created, at least in the beginning. I was tinkering around and trying to answer a question: How much has the popularization of Wins Above Replacement (both Baseball Reference and the Fangraphs version) changed the way MVP voters vote? People argue all the time about WAR, its value, its flaws. Some call it voodoo. Some swear by it. But no matter where people stand individually, I have a strong suspicion that collectively, WAR -- that sort of one-stop shopping statistic that attempts to quantify a players ENTIRE contribution, including offense, defense and base running -- has changed the landscape of MVP voting.
What I found out, I must admit, is not especially interesting or surprising. By breaking down MVP voting over the last 50 years in several unscientific ways, I found that WAR has changed the voting in some ways but not in others. Brilliant, right? What I mean is, I don't see WAR leaders becoming MVPs any more often than the did before the statistic gained favor. From 1975-1984, for example, 11 Fangraphs WAR leaders won the MVP award -- and that was obviously decades before the statistic was even invented. Over the last 10 years, nine Fangraphs WAR leaders have won the MVP award. So I don't think the WAR impact has been that direct.
But where I think WAR has entirely changed the landscape is in getting rid of quirky MVP winners. Since 2008, which is just about when WAR and similar complex statistics started to become mainstream, every single MVP has finished Top 5 in WAR. In fact, every single winner except Miguel Cabrera finished first or second in WAR. Cabrera finished either third or fourth the two years he won, so he wasn't exactly an outlier either.
Before 2008, you would get very odd MVP choices every now and again. Here are only a few (the WAR I use here is the average between fWAR and bWAR):
2006: Justin Morneau, 21st in WAR.
2006: Ryan Howard, 9th
2002: Miguel Tejada, 14th
1998: Juan Gonzalez, 18th
1996: Juan Gonzalez, 31st
1995: Mo Vaughn, 13th
1987: Andre Dawson, 19th
1979: Don Baylor, 26th
1979: Willie Stargell, 34th
1976: Thurman Munson, 10th
1974: Jeff Burroughs, 24th
1974: Steve Garvey, 19th
1970: Boog Powell, 10th
Well, I tend to think those days are over -- at least until a new statistic takes hold. Many people may dislike WAR and many may revolt against it. But realistically, I just don't see how anyone who isn't at least near the top of the WAR chart can build enough of a consensus to win the MVP award these days. I just don't think it's possible now.
Here's an example: Mark Teixeira. As you might know, Tex is having a superb comeback season for a Yankees team that surprisingly leads the American League East. He's among the league leaders in homers and slugging, he should finish with 100-plus RBIs, he is exactly the sort of player who, in the past, could gain all sorts of MVP momentum in September (especially if he hits a few key homers) and take away the award from someone perhaps more worthy overall.
But I think WAR more or less eliminates this possibility. Tex is 15th in WAR. Sure, he will get some MVP votes. I could even see him getting a first place vote or two (and I can also see the Web sites mocking the voters for the choice). But, in the end, I just don't see him coming close to winning, not with Mike Trout and Josh Donaldson so far ahead in WAR.
So, I think that's changed. I don't think WAR is necessarily is picking more winners, but I do think it is eliminating more contenders. I think it's narrowing the field. I just don't think quirky and emotional choices like Willie Stargell will be winning MVP awards now. You can either celebrate or bemoan this, but I think it's the new reality.
Like I say, I didn't find all that particularly interesting or surprising. But I did find one surprise. In trying to answer the question, I looked at a bunch of different baseball statistics -- including Fangraphs and Baseball Reference WAR -- to see how often the MVP led the league in those stats. I went back 50 years. Now, wome of these statistics admittedly were kind of meaningless. For instance, seven MVPs since 1965 led the league in doubles (the last being Dustin Pedroia). Other than being surprised that Miggy didn't lead the league in doubles in either of his MVP years (he led the league the year before and the year after), that stat offered nothing.
But ten of the stats provided what I think is sort of an interesting view of the MVP voting.
Stolen base leader: 2 MVP's out of 92 (2%)
— The only two players since 1965 to win lead the league in stolen bases and win the MVP award are Ichiro in 2001 and Ricky Henderson in 1990. I do think that's sort of interesting. Voters rarely take stolen bases into account when voting for the MVP, which I suspect is why Rickey only won the one MVP (he had a strong case four or five other times), why Tim Raines never won one (which, I think, has hurt his Hall of Fame argument) and why Kenny Lofton was pretty severely underrated. The last player to win the MVP bases almost entirely on stolen bases was Maury Wills in 1962.
Hit leader: 10 out of 92 (11%)
— The only player in the last 10 years to win the MVP as the hit leader was Dustin Pedroia. The year Ichiro set the hit record, he finished seventh in the voting.
Batting champion: 18 out of 92 (20%)
— Well, I did expect the batting champion to win more MVP awards, especially back in the 1960s and 1970s when batting average was the biggest stat going. Even if you go back to the 1930s and 1940s, though, the batting champion just doesn't get named MVP very often. Pete Rose probably won his MVP in 1973 because he was the batting champ. And Willie McGee's high batting average might have won him the award in 1985. But the batting champion-MVP double is rarer than I thought.
Runs leader: 27 out of 92 (28%)
— Over the last 15 years, more MVPs have led the league in runs (9) than RBIs (4). That's kind of an interesting shift. Of course, both runs and RBIs are team-driven statistics, and it's kind of silly to use either one for an individual award. But people have been doing that forever, and it's nice to see runs scored get its due. Once again, if people had looked more closely at runs scored, Tim Raines would have won an MVP award and, I think, he'd rightfully be in the Hall of Fame right now.
Home run leader: 27 out of 92 (28%)
— In the last 50 years, 14 players have won the MVP award without leading the league in any major statistical category. It's a compelling list that includes:
Roberto Clemente, 1966 Boog Powell, 1970 Steve Garvey, 1974 Thurman Munson, 1976 Willie Stargell, 1979 Kirk Gibson, 1988 Frank Thomas, 1993 Barry Larkin, 1996 Ken Caminitii, 1996 Juan Gonzalez, 1996 Ivan Rodriguez, 1999 Jeff Kent, 2000 Miguel Tejada, 2002 Justin Morneau, 2006
In addition, there have been seven MVPs who led the league in just one statistic.
Orlando Cepeda, 1967 (RBIs) Jeff Burroughs, 1974 (RBIs) Dale Murphy, 1982 (RBIs) Robin Yount, 1989 (Runs Created) Mo Vaughn, 1995 (RBIs) Chipper Jones 1999 (Runs Created) Andrew McCutchen, 2013 (fWAR)
I thought for sure that someone in the last 50 years won the MVP award based on hitting a lot of home runs. But it really isn't so. Mark McGwire did not win the MVP award the year he hit 70. Cecil Fielder did not win the MVP award the year he broke the 50-homer barrier for the first time in a decade. Barry Bonds did win the award his 73-homer year, but he did so many other ridiculous things that year that I hardly think the homers won it it for him. I think the last time a player won the MVP award predominantly because of home runs was Roger Maris in 1961.
RBI leader: 34 out of 92 (37%)
— And now we come to the most controversial of MVP statistics. Between 1965 and 1998, more than half the MVP winners were also RBI leaders. This included some of the most bizarre choices in voting history -- Mo Vaughn, Sammy Sosa, George Bell, Don Baylor, Jeff Burroughs and so on.
No can argue that RBIs cast a powerful spell over baseball fans. If baseball statistics have the power of language, as Bill James famously wrote, then RBIs shout. But, it's silly. In 1998, when Juan Gonzalez led the league in RBIs, he came to the plate with more runners on than anybody in the league. Combine that with the fact that he played in a good hitters park and balls were flying everywhere that year and it would have taken a pretty amazing effort for Gonzalez to NOT have led the league in RBIs. Still he won an MVP award for it. In 1979, Don Baylor came to the plate with 40 more runners on base than any other player in baseball. And he was a good hitter. Of course he led the league in RBIs. It shouldn't have made him MVP. But it did.
Anyway, as mentioned above, the RBI train has left the station. You can't win the MVP award now based entirely on RBIs. Sure, there are still some who bemoan the RBIs' fading light and there are some who will still give the statistic a lot of weight come MVP voting time. But there just aren't too many RBI worshipers left.
Baseball Reference WAR leader: 35 out of 92 (38%)
Fangraphs WAR leader: 40 out of 92 (44%)
OK, now we get to the two WARs. I have gone back and forth about their being two distinct versions of WAR. For a while there, I hoped that they would come together a bit more -- it seemed to me that it didn't help the credibility of either version of the statistic to have such divergent results.
Now, I think differently -- I think the two statistics should break apart even more. And I think they should have different names. I understand, as Tom Tango says, that they are just two different methods for the same framework (and technically they have different names, fWAR and bWAR). But I don't completely buy it. I think they take two different approaches to valuing players, especially pitchers (bWAR looks at run prevention, fWAR look at strikeouts, walks and homers). They value defense somewhat differently. I now believe they should break apart, become two completely different statistics. One can be called WAR. The other can be called PEACE (Price Effective Above Common Earthling).
Hey, I think I'm going to do that. Baseball Reference WAR will still be called WAR. Fangraphs will be known as PEACE.
As you can see, the PEACE leader has won the award more often than the WAR leader. I'm not sure why -- I think it comes down to defensive values. I’ll ask Tango his thoughts.
OPS leader: 41 out of 92 (45%)
Plenty of people will tell you that OPS is a junk stat, that it makes absolutely no sense to add together on-base percentage (which uses plate appearances as a denominator) and slugging percentage (which uses at-bats as a denominator). But for whatever reason, it has gained the power of language. I think it's in part because OPS numbers very loosely match up with the grading scale at schools:
.900 and above: A
.800 to .899: B
.600 to .699: D
Below .600: F
OK, it's not exactly like that. As we all know, some years teams score more runs than others, and league-wide OPS goes up and down. Around 2000, the league average OPS was .780 or so. Now it is much closer to .700. So the corresponding grades change. These days, an .850 or better OPS is probably an A, and on down.
This is why many people, myself included, find adjusted OPS+ to be so much more valuable than the raw numbers, because it adjusts the number based on ballpark, run scoring environment and so on. But you might be interested to know that the raw OPS leader has won the award seven more times since 1965 than the adjusted OPS+ leader.
And all this leads -- big finish -- to the whole point of this. The statistic that best predicts the MVP winner is ... yep, Bill James' little invention.
Runs created leader: 44 out of 92 (48%)
There is a lot that is amazing about this, but the main point is this: Nobody looks at runs created when voting for MVP. Well, maybe SOME people do, but I suspect its very few. That means that this stat has done a pretty amazing job picking the MVP award even though nobody refers to it. People just instinctively think along the lines of runs created.
As mentioned, the thing I love about runs created is that it is simple. There are more accurate ways to measure runs, but this is one you can do on a napkin. And it is such an elegant formula. Remember above, I wrote that Mark Teixeira will not win the MVP award this year? Well, runs created explains why in just a few quick calculations.
Who are the two top MVP candidates right now? Mike Trout and Josh Donaldson, probably.
Trout has 132 hits, 64 walks, 260 total bases in 444 at-bats.
Donaldson has 145 hits, 51 walks, 281 totals bases in 480 at-bats.
With a piece of scrap paper, I can quickly figure from that small bit of information that Trout has created 101 runs, and Donaldson has created 103. That's really close. So now, you start thinking about their defense, their base running, their other contributions. It's a great race.
And what of Teixeira?
He has 100 hits, 59 walks, 215 total bases in 389 at-bats.
Quick calculation: He has created 76 runs. And, while that's good, it puts him 25-plus runs behind the other guys. How is he making up that difference? Speed? No. Defense? Not at first base. Leadership? Well, you'd have to give him a whole lot of leadership points.
Runs created really is a little bit of brilliance.