Best Online Sports Betting Sites,Sportsbook,Betting Exchange

Sabermetrics Baseball

What’s in a Number?

On April 27, 1983, the Montreal Expos came to bat in the bottom of the eighth inning trailing the Houston Astros 4-2. First up to face pitcher Nolan Ryan was Tim Blackwell, a lifetime .228 hitter who had struck out in his first time at bat. At this routine juncture of this commonplace game, Ryan stared down at Blackwell, but his invisible-yet, for all that, more substantial-opponent was a man who had died the month before Ryan was born, a man about whom Ryan knew nothing, he confessed, except his statistical line. For at this moment of his glorious big-league career, Ryan had accumulated a total of 3,507 strikeouts, only one short of the mark Walter Johnson had set over twenty-one seasons, from 1907 to 1927. Long thought invulnerable, Johnson’s record was in imminent danger of falling, in 1983, not only to Ryan but also to Steve Carlton and Gaylord Perry.

Ryan fanned Blackwell and then froze the next batter, pinch hitter Brad Mills, with a 1-and-2 curveball. The pinnacle was his. Johnson had been baseball’s all-time strikeout leader since 1921, when he surpassed Cy Young. Ryan would hold that title for just a few weeks, then would be overtaken by Carlton, only to display an incredible finishing kick \\and finish at 5,714 in 1994. But at the time that Ryan topped Johnson, baseball savants scurried to assess the meaning of 3,509 for both the deposed King of K and the new.

In the aftermath of Ryan’s feat, some writers pointed out that he only needed sixteen full seasons, plus fractions of two others, in which to record 3,509 strikeouts while Johnson needed twenty-one, or that Johnson pitched over 2,500 more innings than Ryan. Coming into the 1983 season, Ryan had fanned 9.44 men per nine innings, while Johnson was way down the list at 5.33. And Ryan had allowed fewer hits per nine innings than Johnson, or, for that matter, anyone in the history of the game. So, it would seem 3,509 was not just one batter better than Johnson, but rather was mere confirmation for the masses of a superiority that was clear to the cognoscenti years before.

However, other writers introduced mitigating factors on Johnson’s behalf, much as Ruth found supporters as the home run king even after Aaron hit number 715. These champions of the old order cited Johnson’s won-lost record of 417-279 and earned run average of 2.17 while scoffing at Ryan’s mark, entering 1983, of 205-186 with an ERA of 3.11. This tack led to further argument in print, bringing in the quality of the teams each man pitched for and against, the resiliency of the ball, the attitudes of the batters in each era toward the strikeout, the advent of night ball, integration, expansion, the designated hitter, the overall talent pool, competition from other professional sports … and on down into the black hole of subjectivism.

Why were so many things dragged into that discussion? Because the underlying question about 3,509 was: does this total make Ryan better than Johnson, or even a better strikeout pitcher than Johnson? At the least, does it make him a great pitcher? In our drive to identify excellence on the baseball field (or off it), we inevitably look to the numbers as a means of encapsulating and comprehending experience. This quantifying habit is at the heart of baseball’s hidden game, the one ceaselessly played by Ryan and Johnson and Ruth and Aaron-and, thanks to baseball’s voluminous records, more than 14,000other players-in a stadium bounded only by the imagination.

What’s in a number? The answer to “How Many?” and sometimes a great deal more. In this case, 3,509 men had come to the plate against Ryan and failed to put the ball in play, one more man than Johnson had returned to the dugout, cursing. So what’s the big deal? That Ryan was .0002849 faster, scarier, tougher-better-than Johnson? An absolute number like 3,509, or 714 (the home-run record once thought invulnerable, too), or 4,191 (the erroneous hit total of Ty Cobb that Pete Rose finally surpassed) does not resound with meaning unless it is placed into some context that will give it life.

Baseball statistics are not the instruments of vivisection, taking the life out of the game in order to examine it; rather, statistics are themselves the vital part of baseball, the only tangible and imperishable remains of contests played yesterday or a hundred years ago. Baseball may be loved without statistics, but it cannot be understood without them. As the statistics reflect more accurately the reality of what happened on the field, greater understanding leads to a deeper love and appreciation of this great game-which is, essentially, the case for sabermetrics and the reason for Total Baseball.

The Linear Weights System

In 1982, Milwaukee’s Robin Yount had the year of his life, batting .331 with 29 homers, 114 RBIs and 129 runs scored; he led the American League in hits, doubles, total bases, and slugging percentage, while finishing just one point behind the league leader in batting average. First of the two times in his career, he was voted the Most Valuable Player in the American League, being named first on all but one of the twenty-eight ballots cast by the baseball writers.

Over in the other league, Mike Schmidt of the Phillies was having an off year, batting only .280 with 35 homers and 87 RBIs; the previous year, when he was awarded the MVP, in only 102 games played he had totaled 31 homers and 91 RBIs. He did lead the league once again in 1982 in slugging percentage, and he did win the Gold Glove at third base for the seventh straight year, yet in the MVP balloting none of the ballots listed him higher than fourth; ten ballots were cast without listing him at all.

For Yount, 1982 was a crowning achievement; for Schmidt, a disappointment: that was the verdict reached by the baseball writers and conventional baseball statistics. Yet in terms of actual performance, as determined by the number of runs contributed, Schmidt’s “off year” was scarcely different from Yount’s. With the bat, Yount accounted for 59 park-adjusted runs beyond what an average batter might have contributed; Schmidt, 45. Through base stealing, Yount added 2; Schmidt none. With the glove, Yount was 4 runs below league average at his position, shortstop; Schmidt was 19 above average at third base. Total runs contributed: Yount 57, Schmidt 64. (Because Yount’s batting so far exceeded that of other shortstops, while third base provided several heavy hitters, Yount contributed 7.0 extra wins to his Brewers; Schmidt contributed 6.1 to the Phillies.)Both men had outstanding seasons, the best in their respective leagues, and both outstripped the second-best player by about the same margin.

Viewing player (and team) performance through this sort of prism frequently produces such illuminating results. Cecil Fielder had a wonderful year in 1990, with his 51 homers, 132 RBIs, and league-leading figures in slugging average and extra-base hits. But how did he convince any writer voting for MVP that he had a better year than Rickey Henderson? In Total Baseball, you could look it up: Fielder contributed 4.2 extra wins to his team (wins that an average player would not), which was the fourth-best figure in the American League that year; Henderson was responsible for a whopping 8.2, not only the top mark in 1990 but also, at the time, the second-best mark in the AL since Mickey Mantle’s epic seasons of 1956-57!

This is the kind of analysis of player performance possible with a variety of sabermetric measures, not just the Linear Weights System. The common ingredient of most of the new, as yet unofficial statistics is their creators’ recognition of the relationship between runs and wins.These newly calculated measures are not official statistics of Major League Baseball, but they are constructed from the raw data of the official record. Some of the new measures may one day be officially embraced, as the on base percentage became an official stat three decades after its introduction. Because of fan interest, we include them in the Player and Pitcher Registers that follow, alongside the officialy tabulated numbers.

Runs and Wins

George Lindsey, in an article in Operations Research in 1963, was the first to assign run values to the various offensive events which lead to runs:

Runs = (.41)1B + (.82)2B + (1.06)3B + (1.42)HR. He based these values on recorded play-by-play data and basic probability theory. Unlike Earnshaw Cook, who in the following year assigned run values on the basis of the sum of the individual scoring probabilities-that is, the direct run potential of the hit or walk plus those of the baserunners set in motion-Lindsey recognized that a substantial part of the run value of any non-out is that it brings another man to the plate. This additional batter has a one-in-three chance of reaching base and thus bringing another man to the plate with the same chance, as do the batters to follow. The indirect run potential of these batters cannot be ignored.

Steve Mann’s Run Productivity Average (RPA) assigned these values based on observation of some 12,000 plate appearances: RPA = (.51)1B + (.82)2B + (1.38)3B + (2.63)HR + (.25)BB + (.15)SB – (.25)CS, all divided by plate appearances, then plus .016. His values were denominated in terms of the number of runs and RBIs each event produced. Bill James, at about the same time, came up with a similar formula, since shunned, with values based on runs plus RBIs minus home runs. The drawbacks to the approaches of Mann and James were the drawbacks of the RBI, which gives the entire credit for producing a run to the man who plates it, and of the run scored, which gives credit only to the man who touches home, no matter how he came to do so. For example, with no outs, a man reaches first on an error; the next batter hits a double, placing runners on second and third; the following batter taps a roller to short and is thrown out at first, with the run scoring from third. The man who produced the out is given the credit for producing a run, while the man who started the sequence by reaching first on an error is likewise credited with a run. The man who hit the double, which was surely the key event in the sequence which produced the run, and the only one reflecting batting skill, receives no credit whatsoever. In this regard, any formula based on “Runs Produced” (whether R + RBI or R + RBI – HR) is philosophically inferior to the formula Lindsey proposed, despite his failure to account for walks, steals, and other events.

The run values in the Linear Weights formula for identifying batters’ real contribution are derived from Pete Palmer’s 1978 computer simulation of all major-league games played since 1901. All the data available concerning the frequencies of the various events was collected; following a test run, these were tabulated. Unmeasured quantities, such as the probability of a man going from first to third on a single vs. that of his advancing only one base, were assigned values based on play-by-play analysis of over 100 World Series contests. The goal was to get all the measured quantities very nearly equal to the league statistics; then the simulation would provide run values of each event in terms of net runs produced above average. Expressing the values in those terms would give a meaningful base line to individual performances, because if you are told that a player contributed 87 runs you don’t know what that signifies unless you know the average level of run contribution in that year: 87 may sound like a lot, but if the norm was 80, then you know the player contributed only 7 runs beyond average.

The values obtained from the simulation are remarkably similar from one era to the next, confounding expectations that the home run would prove more valuable today than in the dead-ball era, or that the steal was once a primary offensive weapon. These values are expressed in beyond average runs.

Run Values

Event Period
1901-20 1921-40 1941-60 1961-77
home run 1.36 1.40 1.42 1.42
triple 1.02 1.05 1.03 1.00
double .82 .83 .80 .77
single .46 .50 .47 .45
walk/HBP .32 .35 .35 .33
stolen base .20 .22 .19 .19
caught stealing -.33 -.39 -.36 -.32
out [*] -.24 -.30 -.27 -.25
*An out is considered to be a hitless at bat and its value is set so that the sum of all events times their frequency is zero, thus establishing zero as the base line, or norm, for performance.

In the years since this simulation was conducted, statistician Dave Smith (“Maury Wills and the Value of the Stolen Base,” Baseball Research Journal, 1980) convinced Pete to adjust the values of the stolen base and caught stealing because of their situation-dependent, elective nature: attempts are apt to occur more frequently in close games, where they would be worth more than if they were distributed randomly the way an event like a single or a home run would be. Pete revised the value for the steal upward to .30 runs, while for the caught stealing it becomes -.60 runs.

Just as these run values change marginally with changing conditions of play, they differ slightly up and down the batting order (a homer is not worth as much to the leadoff hitter as it is to the fifth-place batter; a walk is worth more for the man batting second than for the man batting eighth); however, these differences have been averaged out in the figures above. For evaluating runs contributed by any batter at any time, there is no better method than Batting Runs, the Linear Weights formula derived from the computer simulation which is the basis of the table above.

The Formula

Runs = (.47)1B + (.78)2B + (1.09)3B + (1.40)HR + (.33)(BB + HB) + (.30)SB – (.60)CS – (.25)(AB – H) – .50(OOB).

The events not included in the formula that you might have thought to see are sacrifices, sacrifice hits, grounded into double plays, and reached on error. The last is not known for most years and in the official statistics is indistinguishable from outs on base (OOB). The sacrifice has essentially canceling values, trading an out for an advanced base which, often as not, leaves the team in a situation with poorer run potential than it had before the sacrifice. The sacrifice fly has dubious run value because it is entirely dependent upon a situation not under the batter’s control: while a single or a walk always has a potential run value, a long fly does not, unless a man happens to be poised at third base (whether it is achieved by accident or design is open to question, as well, but that is beside the point-getting hit by a pitch is not a product of intent, either). Last, the grounded into double play is to a far greater extent a function of one’s place in the batting order than it is of poor speed or failure in the clutch, and thus it does not find a home in a formula applicable to all batters. It is no accident that Henry Aaron, who ran well for most of his long career and wasn’t too bad in the clutch, hit into more DP’s than anyone else, nor that Roberto Clemente, Al Kaline, and Frank Robinson, who fit the same description, are also among the ten “worst” in this department. If a .230-hitting American League shortstop doesn’t hit into many twin killings, it’s not because of adept bat handling or blazing speed but because he bats ninth.

The Linear Weights formula for batters may be long, but it calls for only addition, subtraction, and multiplication and thus is as simple as the slugging average, whose incorrect weights (1, 2, 3, and 4) it revises and expands upon. Each event has a value and a frequency, just as in slugging average, yet-as in no batting statistic you have ever seen-outs are treated as offensive events with a run value of their own (albeit a negative one), a truth so obvious it somehow escaped notice. Just as the run potential for a team in a given half inning is boosted by a man reaching base, it is diminished by a man being retired; not only has he failed to change the situation on the bases but he has deprived his team of the services of a man further down the order who might have come up in this half inning, either with men on base and/or with scores already in.

What Batting Runs does is to take every offensive event and treat it in terms of its impact upon the team-an average team, so that a man does not benefit in his individual record for having the good fortune to bat cleanup with the Giants or suffer for batting seventh with the Marlins. The relationship of individual performance to team play is stated poorly or not at all in conventional baseball statistics. In Batting Runs it is crystal clear: the \\linear progression, the sum of the various offensive events, when weighted by their accurately predicted run values, will total the runs contributed by that batter or that team beyond the league average.

Recognizing that some dedicated readers of Total Baseball will wish to keep track of batting performance by computing Batting Runs themselves over the course of a season, and that they may be frustrated by the difficulty of calculating the “At Bats-Hits” factor for the league, which is necessary to determine the negative value of an out, we advise that using a fixed value of .25 for outs will tend to work quite well if you wish to include pitcher batting performance, and a fixed value of -.27 will serve if you wish to delete it. Actually, any fixed value will suffice in midseason; it’s only when all the numbers are in and you care to compare this year’s results with last year’s (or with those of the 1927 Yankees) that more precision is desirable. At that point the value of the out may be calculated by the ambitious among you, but ideally, the sporting press will provide accurate Batting Runs figures. Who, after all, calculates ERA for himself?

Batting Runs

For those to whom calculation is anathema, or at the least no pleasure, Batting Runs has a “shadow stat” that tracks its accuracy to a remarkable degree and is a breeze to calculate: Production, which consists simply of On Base Percentage Plus Slugging Average. While it is not expressed in runs and thus lacks the philosophical appeal of Batting Runs, the standard deviation of its most complete version is 20.4 runs compared to the 19.8 of Batting Runs. In other words, the correlation between Batting Runs and Production over the course of an average team season is 99.7 percent.

However, as an average or ratio, Production measures the rate of batting success (efficiency), while Batting Runs measures the amount of success. For example, a batter who goes 2-for-5 with a walk in one game, those 2 hits being doubles, will have an On Base Percentage of .500 and a Slugging Average of .800; his Production will be 1.3, or as stated for convenience in Total Baseball, 130. Another batter, who in 162 games gets 200 hits and 100 walks in 500 at bats, with 400 total bases, will have an identical OBP, SLG, and PRO. Which player has contributed more to his team? Clearly, longevity, or amount of production, is no less important than rate of production.

To cite a specific instance in which Production and Batting Runs differ, take George Brett’s remarkable 1980 season in which he batted .390, had 298 total bases, 75 bases through walks or HBP, and 118 RBIs-all in only 117 games played. In the table of all-time single-season leaders in production, the Kansas City third baseman ranks 44th when his PRO of 1.124 is normalized to the league average and adjusted for home-park effects. Yet in the table of park adjusted Batting Runs, Brett’s season ranks out of the top 100 because he missed 45 games, in which his team derived no benefit from his high rate of performance. (Had Brett played 162 games and continued to perform at the same level, his Batting Runs would have been not 64.8 but 89.7, the 19th best mark in history.)

Because PRO is not expressed in runs, it is less versatile than Batting Runs. For just as runs are proportional to the events that form them, so are they proportional to wins and losses. This statement, a truism today, was a novelty in 1954 when Rickey and Roth first stated the correlation between run differentials and team standings. But they did not take the next step, to recognize that not only a team’s standing but even its won-lost record could be predicted from the run totals.

“The initial published attempt on this subject,” Pete wrote in the 1982 issue of the SABR annual The National Pastime, “was Earnshaw Cook’s Percentage Baseball, in 1964. Examining major-league results from 1950 through 1960 he found winning percentage equal to .484 times runs scored divided by runs allowed. . . . Arnold Soolman, in an unpublished paper which received some media attention, looked at results from 1901 through 1970 and came up with winning percentage equal to .102 times runs scored per game minus .103 times runs allowed per game plus .505. . . . Bill James, in the Baseball Abstract, developed winning percentage equal to runs scored raised to the power x, divided by the sum of runs scored and runs allowed each raised to the power x. Originally, x was equal to two but then better results were obtained when a value of 1.83 was used. . . .

“My work showed that as a rough rule of thumb, each additional ten runs scored (or ten less runs allowed) produced one extra win, essentially the same as the Soolman study. However, breaking the teams into groups showed that high-scoring teams needed more runs to produce a win. This runs-per-win factor I determined to be ten times the square root of the average number of runs scored per inning by both teams. Thus in normal play, when 4.5 runs per game are scored by each club, each team scores .5 runs per inning-totaling one run, the square root of which is one, times ten.”

Note that when Palmer refers to the need for approximately ten additional runs scored (or ten fewer allowed) to provide a team with an additional win, he does not mean that it takes ten runs to win any given game. Obviously, in a specific case, a one-run margin is all that is required; but statistics are designed for the long haul, not the short.

What does this have to do with Batting Runs? Remembering that Batting Runs are expressed not simply in runs but in beyond-average runs, the conversion from a batter’s Linear Weights runs to his wins is a snap: simply divide Batting Runs by the number of runs it takes to gain an extra win in a given year. Taking the exploits of Babe Ruth in 1927, we see that through batting alone he contributed 100.7 runs, or 9.56 wins, since in the American League in 1927 it took 10.53 runs to produce an additional win. If every other player on the Yankees had performed at the league average, the New York record should have been 87-67; if each of the seven other batters had performed only half as well as Ruth and had added five extra wins (discounting reserves, pitchers, fielders, and stealers, whom we shall presume for this discussion to have been average), the Yankees would have gained another 35 wins (7 X 5) to finish with a won-lost mark of 122-32.

Production

For those to whom calculation is anathema, or at the least no pleasure, Batting Runs has a “shadow stat” that tracks its accuracy to a remarkable degree and is a breeze to calculate: Production, which consists simply of On Base Percentage Plus Slugging Average. While it is not expressed in runs and thus lacks the philosophical appeal of Batting Runs, the standard deviation of its most complete version is 20.4 runs compared to the 19.8 of Batting Runs. In other words, the correlation between Batting Runs and Production over the course of an average team season is 99.7 percent.

However, as an average or ratio, Production measures the rate of batting success (efficiency), while Batting Runs measures the amount of success. For example, a batter who goes 2-for-5 with a walk in one game, those 2 hits being doubles, will have an On Base Percentage of .500 and a Slugging Average of .800; his Production will be 1.3, or as stated for convenience in Total Baseball, 130. Another batter, who in 162 games gets 200 hits and 100 walks in 500 at bats, with 400 total bases, will have an identical OBP, SLG, and PRO. Which player has contributed more to his team? Clearly, longevity, or amount of production, is no less important than rate of production.

To cite a specific instance in which Production and Batting Runs differ, take George Brett’s remarkable 1980 season in which he batted .390, had 298 total bases, 75 bases through walks or HBP, and 118 RBIs-all in only 117 games played. In the table of all-time single-season leaders in production, the Kansas City third baseman ranks 44th when his PRO of 1.124 is normalized to the league average and adjusted for home-park effects. Yet in the table of park adjusted Batting Runs, Brett’s season ranks out of the top 100 because he missed 45 games, in which his team derived no benefit from his high rate of performance. (Had Brett played 162 games and continued to perform at the same level, his Batting Runs would have been not 64.8 but 89.7, the 19th best mark in history.)

Because PRO is not expressed in runs, it is less versatile than Batting Runs. For just as runs are proportional to the events that form them, so are they proportional to wins and losses. This statement, a truism today, was a novelty in 1954 when Rickey and Roth first stated the correlation between run differentials and team standings. But they did not take the next step, to recognize that not only a team’s standing but even its won-lost record could be predicted from the run totals.

“The initial published attempt on this subject,” Pete wrote in the 1982 issue of the SABR annual The National Pastime, “was Earnshaw Cook’s Percentage Baseball, in 1964. Examining major-league results from 1950 through 1960 he found winning percentage equal to .484 times runs scored divided by runs allowed. . . . Arnold Soolman, in an unpublished paper which received some media attention, looked at results from 1901 through 1970 and came up with winning percentage equal to .102 times runs scored per game minus .103 times runs allowed per game plus .505. . . . Bill James, in the Baseball Abstract, developed winning percentage equal to runs scored raised to the power x, divided by the sum of runs scored and runs allowed each raised to the power x. Originally, x was equal to two but then better results were obtained when a value of 1.83 was used. . . .

“My work showed that as a rough rule of thumb, each additional ten runs scored (or ten less runs allowed) produced one extra win, essentially the same as the Soolman study. However, breaking the teams into groups showed that high-scoring teams needed more runs to produce a win. This runs-per-win factor I determined to be ten times the square root of the average number of runs scored per inning by both teams. Thus in normal play, when 4.5 runs per game are scored by each club, each team scores .5 runs per inning-totaling one run, the square root of which is one, times ten.”

Note that when Palmer refers to the need for approximately ten additional runs scored (or ten fewer allowed) to provide a team with an additional win, he does not mean that it takes ten runs to win any given game. Obviously, in a specific case, a one-run margin is all that is required; but statistics are designed for the long haul, not the short.

What does this have to do with Batting Runs? Remembering that Batting Runs are expressed not simply in runs but in beyond-average runs, the conversion from a batter’s Linear Weights runs to his wins is a snap: simply divide Batting Runs by the number of runs it takes to gain an extra win in a given year. Taking the exploits of Babe Ruth in 1927, we see that through batting alone he contributed 100.7 runs, or 9.56 wins, since in the American League in 1927 it took 10.53 runs to produce an additional win. If every other player on the Yankees had performed at the league average, the New York record should have been 87-67; if each of the seven other batters had performed only half as well as Ruth and had added five extra wins (discounting reserves, pitchers, fielders, and stealers, whom we shall presume for this discussion to have been average), the Yankees would have gained another 35 wins (7 X 5) to finish with a won-lost mark of 122-32.

Stolen Base Runs

The Linear Weights formula for batters contains a factor for base stealers, expressed in runs. How do you judge the effectiveness of a base stealer? Conventional baseball statistics will lead you to the conclusion that whoever has the most steals is the best thief; that is the sole criterion for The Sporting News annual “Golden Shoe Award” in each league. How often the man with the most steals may have been thrown out is of no concern.

An article in the 1981 Baseball Research Journal by Bob Davids offered something more sophisticated yet utterly simple: a stolen base percentage, which is simply stolen bases divided by attempts. The best stolen base average of all time, insofar as we know and based on a minimum of 30 attempts, is Max Carey’s in 1922 when he stole 51 bases in 53 attempts. The most times caught stealing in the course of a season was Ty Cobb’s 38 in 1915, until 1982 when Rickey Henderson was nabbed 42 times. But the best method yet devised, and the one that is pleasingly simple, is to apply the Linear Weights method to get Stolen Base Runs. One multiplies the steals by their run value of .30 and the failed attempts by -.60, and adds the two products. The implication for such men as Ty Cobb, Rickey Henderson, and Vince Coleman is clear: it takes a fabulous stealing performance to produce as much as one extra win for the team.

In 1915 Ty Cobb, when he established the modern stolen base record of 96, can be seen to have contributed to his team 28.8 runs, while his 38 foiled larcenies cost 22.8. Thus Cobb, for all his whirling-dervish activity, accounted for only 6 non-par runs-not even a single win. Whoa! You mean that not a single one of Cobb’s steals produced a victory? That is not what is being said: the fact is that while the gain from the stolen base is entirely visible-an extra base which may be followed by a hit that would otherwise not have produced a run-the cost of the caught stealing is entirely invisible, or conjectural, except with the aid of statistics. How many big innings did Cobb run his team out of? How many batters reached base in ensuing innings who might, in an earlier inning, have had their contributions count for runs? What Stolen Base Runs indicate are that, on balance, not on a specific-case basis, the stolen base is at best a dubious method of increasing a team’s run production.

Now let’s take a look at what Henderson did. His record 130 stolen bases in 1982 produced 39 runs for his team. His 42 failed attempts took away 25.2 possible runs. Net effect: approximately 14 runs, or one and a half wins, a performance nearly three times as good as Cobb’s. In 1983, stealing 22 fewer bases, he was even better, accounting for 21.0 runs. However, the all-time best stealing record is that of Maury Wills in 1962, when he stole 104 bases and was caught only 13 times. Wills’s 104 stolen bases produced 31.2 runs; his 13 failed attempts cost only 7.8. So, his baserunning contribution was 23.4, or a little over two wins.

Fielding Runs

As mentioned earlier, in 1954 when Branch Rickey and Allan Roth came up with their “efficiency formula” for run scoring and run prevention, the defensive half of the equation was divided into five segments, the last of which was fielding, to which they assigned a mathematical value of zero. “There is nothing on earth,” Rickey declared, “anyone can do with fielding.”

Since then many have tried, with mixed results, to improve upon the mere toting up of raw data-putouts, assists, errors, double plays. In the second edition of Total Baseball, we improved upon the Fielding Runs formula by calculating innings played at each position, plate appearances for all players on the team, and then rating each fielder based on his chances per inning. (Formerly we had rated each position on each team based on totals for all players on that team at that position; then we split up the total based on putouts. For more on the formula, see the Introduction to the new Fielding Register, and the Glossary.)

We now rate left fielders against left fielders, center fielders against center fielders, and right fielders against right fielders; where previously all outfield positions had been grouped together. We revised thoroughly the formula for catchers, which retains the highest degree of subjectivism because their primary defensive contribution comes not with the glove but through calling the pitches.

 

Pitching Runs

Determining the run contributions of pitchers is much easier than determining those of fielders or batters, though not quite so simple as that of base stealers. Actual runs allowed are known, as are innings pitched. Let’s assume that a pitcher is responsible only for earned runs. Then why, we hear some of you asking, is the ERA not measure enough of his ability? Because it tells only the pitcher’s rate of efficiency, not his actual benefit to the team. In a league with an ERA of 3.50, a starter who throws 300 innings with an ERA of 2.50 must be worth twice as much to his team as a starter with the same ERA who appears in only 150 innings. Through Pitching Runs, we seek to determine the number of beyond-average runs a pitcher saved-the number he prevented from scoring that an average pitcher would have allowed.

The formula for Earned Run Average is:

ERA = (Earned Runs x 9)/Innings Pitched

The number of average, or par, runs for a pitcher, which is represented by a Pitching Runs figure of zero, is equal to:

(League ERA X IP)/9

If the league ERA is 3.79 (as the National League’s was in 1990) and a pitcher’s ERA is also 3.79, he will by definition have held batters in check at the league average no matter how many innings he pitched. If, however, his ERA was 2.67 and he hurled 249 innings (as Frank Viola did for the Mets in ’90), he will have saved a certain number of runs that an average pitcher might have allowed in his place; to find that number we employ the Pitching Runs formula:

Pitcher’s Runs = Innings Pitched X (League ERA/9) – ER

This represents the difference between the number of earned runs allowed at the league average for the innings pitched and the actual earned runs allowed. For the case of Viola, we get

Runs = 249 X 3.79/9 – 74 = 31.2

Viola was 31.2 runs better than the average National League pitcher in 1990, and had he been transported to an average NL team-that mythical entity that scores as many runs as it allows while winning 81 and losing 81–he would have made that team’s mark 84-78. An alternative way to calculate pitchers’ Linear Weights, useful with oldtimers for whom you may have the ERA but not the number of earned runs allowed, is to use the pitcher’s ERA, subtracted from the league’s ERA, multiplying by the innings pitched, then dividing by nine. In Viola’s case, this approach would look like:

(3.79 – 2.67) X 249/9 = 31.0

The difference of two tenths of a point is accounted for because we are using the ERA of 2.69, which has been rounded off, rather than the absolute figure of the pitcher’s earned runs allowed, 74.

The two parts of performance-efficiency and durability, or how well and how long-are incorporated into all Linear Weights measures. If you are performing at a better than average clip, the more regularly you do so, the more your team will benefit and thus the higher your Linear Weights measure. If you are stealing bases nine times out of ten, your team will benefit more from sixty attempts than from forty; if you are batting at an above average clip, it’s better to play in 160 games than 110; if you’re allowing one earned run per game less than the average pitcher, your LWTS will increase with innings pitched.

A problem emerges in this regard when trying to compare the Pitching Runs of a pitcher from 1978 like Ron Guidry, with that of Hoss Radbourn in 1884. In the “efficiency” component of the formula, which may be understood as the league ERA minus the individual’s ERA, the two compare this way:

  • Guidry = 3.76 – 1.74 = 2.02
  • Radbourn = 2.98 – 1.38 = 1.60
  • Guidry’s differential is “unfairly” boosted by the higher league ERA of 1978; in fact, if we had compared the two by their normalized ERAs, which is logically more sound, the results would have been:

    Guidry = 3.76/1.74 = 2.16 Radbourn = 2.98/1.38 = 2.16

    Yet because rules and playing conditions allowed Radbourn to extend his efficiency over 679 innings, while Guidry hurled “only” 274, their Pitching Runs look like this:

    Guidry = 62.0 Radbourn = 120.6

    There is a great deal more to say on the subject of pitching and sabermetric stats: see the Introduction to the Pitching Register and the Glossary.

    Linear Weights in Practice

    Having formulas for pitching, fielding, baserunning, and batting, we can assess the run-scoring contribution of every individual who has ever played the game, and thus the number of wins that he has contributed in a given season or over his career. The number of runs required to produce an additional win has varied over the years between 9 and 11 runs, with a very few league seasons outside those parameters.

    Limited by conventional baseball statistics, one might, in 1990, have uttered something like, “Barry Bonds hit .293 with 33 homers and 114 RBIs-the guy must have been worth 10 extra wins to Pittsburgh all by himself!” Or: “The White Sox are only one pitcher away from winning the division.” Or: “The Yankees are only three players away from being a contender.” Or, “Letting Darryl Strawberry get away was the worst thing the Mets ever did; they’ll be a second-division club for a decade.” With Linear Weights, these statements, or rather the concerns they reflect, can be approached with some data and with some degree of objectivity. First: Barry Bonds had a fine year in 1990, but to have contributed 10 wins by himself he would have had to account for nearly 100 Linear Weights runs, a mark that up till then had been attained by only three men in major-league history. In fact, Bonds contributed 6.5 wins in ’90, though he did post 9.0 wins in 1992.

    As to the White Sox, they finished 94-68 in 1990, while their Linear Weights projected them to finish at 81-81. The Athletics, who won the AL West at 103-59, actually projected to finish 96-66. So, the Sox management might have asked, how to close ground on the Athletics? Could one pitcher-like Bob Welch, for whom they bid in the free-agent bazaar-make the difference? To do so, he would have to contribute about 150 Pitching Runs, a feat no pitcher has ever accomplished. In 1990, pitching for Oakland-and remember, the Linear Weights formula is divorced from considerations of batter support-Welch contributed 20.7 park-adjusted Pitching Runs. So presuming that he pitched as well for the White Sox as he did for the Athletics, or even slightly better, he would not be enough to “win” Chicago the flag on paper; Chicago would need help from other quarters.

    Regarding the other statements, you get the picture: sabermetric analyses like the ones above will tend to puncture fantasies.

    Park Factor

    A central issue for sabermetricians is the network of illusion created by home-park dimensions, atmospheric conditions, and visibility for batters. How many home runs would Mark McGwire hit if he played half his games in Fenway Park? Will the Boston Red Sox and Chicago Cubs keep “failing” to put together solid pitching staffs-or has their pitching been adequate all along? Why have the American League leaders in triples so often worn a Royals uniform? One’s home park has a powerful effect on a player or pitcher’s record, elevating some good players to greatness and denying the spotlight to some outstanding performers.

    It should be understood that the average player does better at home regardless of the park-familiarity breeds success, it seems. Individuals bat and pitch at a rate 10 percent higher at home, on average. But parks don’t create performance; they only affect it. For example, a lefthanded hitter at Fenway can do very well indeed, as Wade Boggs had, by learning to take the outside pitch to left field. Likewise, a righthanded batter can make the friendly Green Monster into his nemesis by trying to pull every pitch.

    For hard luck in home parks, it is tough to top the record of Dave Winfield, who has had the misfortune to call both San Diego and Yankee stadiums home before landing in the more or less neutral Big A in Anaheim. Through 1990, his lifetime Production, normalized to league average but not adjusted for park effects, was 117th best on the all time list of those playing in 1,000 games. Had he played his home games instead in Fenway Park, his PRO would have projected to the 45th best of all time. Had he even played in an average hitters’ park-which is what PRO+ measures-his record would show itself to be the 80th best ever.

    If we desire to remove the silver spoon or the millstone that a home park can be, and measure individual ability alone, we must create a statistical balancer that diminishes the individual batting marks created in parks like Fenway and augments those created in San Diego. Pete Palmer developed an adjustment that enables us, for the first time, to measure a player’s accomplishments apart from the influence of his home park.

    Parks differ in so many ways that it may be hard to imagine how their differences can be quantified. The most obvious way in which they differ is in their dimensions, from home plate to the outfield walls, and from the base lines to the stands. The older arenas-Fenway Park, Wrigley Field, Tiger Stadium-tend to favor hitters in both regards, with reachable fences and little room to pursue a foul pop. The exception among the older parks was Chicago’s Comiskey, which, in keeping with the theories of Charles Comiskey back in 1910 and the team’s perceived strength, was built as a pitcher’s park. Yet two parks can have nearly equal dimensions, like Pittsburgh’s Three Rivers Stadium and Atlanta’s Fulton County Stadium, yet have highly dissimilar impacts upon hitters because of climate (balls travel farther in hot weather), elevation (travel farther as altitude increases), and playing surface (travel faster and truer on artificial turf). Yet another factor is how well batters think they see the ball; Shea Stadium is notorious as a cause of complaints.

    And perhaps more important than any of the objective park characteristics, suggested Robert Kingsley in a 1980 study of why so many homers were hit in Atlanta, is the attitude of the players, the way that the park changes their view of how the game must be played in order to win. Every team that comes into Atlanta in August knows that the ball is going to fly and, whether it is a team designed for power or not, it plays ball there as if it were the 1927 Yankees. In their own home park the Astros may peck and scratch for runs, but in Atlanta they will put the steal and hit-and-run in mothballs. Conversely, a team which comes into the Astrodome and plays for the big inning will generally get what it deserves-a loss. The successful team is one that can play its game at home-the game for which the team was constructed-yet is flexible enough to adapt when on the road. How to quantify attitude?

    Rather than try to assign a numerical value to each of the six or more variables that might go into establishing an estimator of homepark impact, Pete looked to the single measure in which all these variables are reflected-runs. After all, why would we assign one value to dimensions, another to climate, and so on, except to identify their impact on scoring? If a stadium is a “hitters’ park,” it stands to reason that more runs would be scored there than in a park perceived as neutral, just as a “pitchers’ park” could be expected to depress scoring.

    The full and lengthy explanation for the computation of the Park Factor is left to the Glossary, where hardy readers might consider taking a peek right now. For most of us, though, it will be enough to understand that the Park Factor consists mainly of the team’s home-road ratio of runs allowed, computed as it was above for the league, compared to the league’s home-road ratio.

    Just as Dave Winfield’s stats suffered for the home parks he played in until he joined the California Angels, Dean Chance, star pitcher of the Angels in the mid-1960s, benefited from playing in Chavez Ravine when it was notoriously rough on hitters. This is not to say Chance had anything but a marvelous year in 1964: 20 wins, a 1.65 ERA, and 11 shutouts are hard to argue with. Still, in 81 home games in 1964, the Angels allowed 226 runs; in 81 games on the road, they allowed 325?44 percent more, where a 10 to 11 percent increase would have been normal. If one is to compare Chance and, say, Bert Blyleven in his years with Minnesota fairly, you must deny one the benefit of his home park and remove from the other the onus of his. This is what Park Factor does.

    For decades, the all-time scoring squelcher was Chicago’s South Side Park, which saw service at the dawn of the American League. From 1901 through 1990, its last full year of service to the White Sox, this cavernous stadium produced home run totals like the 2 in 1904, 3 in 1906, and 4 in 1909; in two years the Sox failed to hit any homers at home, thus earning the nickname “Hitless Wonders.” In 1906, Chicago pitchers held opponents to 180 runs at South Side Park, an average of 2.28 runs per game, earned and unearned, in a decade when 4 of every 10 runs were unearned. This mark held until 1981, when the Astrodome intimidated opposing hitters to such a point that in the 51 home dates of that strike-shortened season, Astro hurlers were touched for only 106 runs?2.08 per game. The Pitcher Park Factor of .817 for the Astrodome was the lowest ever. Those who suspected that men like Joe Niekro, Don Sutton, Vern Ruhle, et al., were perhaps not world beaters after all were right: look at the ERAs the Astro starters registered that year, and what these ERAs might have been in an average park like Shea that year (BPF: 1.00) or a moderately difficult pitchers’ park like San Francisco (BPF: 1.06).

     

    Houston Pitchers, 1982
    ERA BPF: 1.00 BPF: 1.06
    Nolan Ryan 1.69 2.07 2.19
    Joe Niekro 2.82 3.43 3.64
    Vern Ruhle 2.91 3.56 3.77
    Bob Knepper 2.18 2.66 2.82
    Don Sutton 2.60 3.17 3.36
    HOUSTON (all) 2.66 3.24 3.44
    SAN FRANCISCO (all) 3.28 3.09 3.28

    Some observations prompted by this table: San Francisco with its team ERA of 3.28 had a better pitching staff than Houston with its 2.66; and Houston batters, regarded as a Punch-and-Judy crew by all observers, must have been a lot more effective than heretofore suspected. In fact, when Houston batters’ totals (eighth in runs scored, eighth in LWTS) are adjusted for park, the Astros emerge on ability as the best hitting team in the National League of 1981! Even without the application of Park Factor, one might have come to a similar conclusion by examining the runs scored totals for all NL clubs on the road in 1981. Houston’s total was exceeded only by those of the Dodgers and Reds.

    Proceeding from a similar hunch, we may look at the batting record of the “Hitless Wonders” of 1906, who won the pennant (and the World Series, in six games over a Cubs team which went 116-36 during the season). Baseball lore has it that a magnificent pitching staff (Ed Walsh, Doc White, Nick Altrock, and others) overcame a puny batting attack (BA of .230, 6 homers, slugging percentage of .286). In fact, the Sox scored more runs on the road than all but one AL team, and their Batting Linear Weights, when adjusted for park, was third in the league-the same rank achieved by their pitching. (How they won the pennant remains a mystery, though, for both Cleveland and New York had vastly superior teams on paper.)

    Relativity

    Sabermetric statistics can be marvelous tools for cross-era comparisons, enabling us to determine if baseball’s history is truly a seamless web or if its seams are real enough, but are camouflaged by traditional statistics.

    If Batter A presented himself to you for approval with these statistics–.330 batting average, 16 home runs, 107 RBIs-what would your reaction be? You’d like to have him on your team, right? And what to make of Batter B, who presents these numbers–.257 batting average, 14 home runs, 53 RBIs? Not bad for a middle infielder with a good glove, you say, but otherwise undistinguished? In fact, the “impressive” figures of Batter A represent the average performance of a National League outfielder in 1930, while the “blah” figures of Batter B are those of the average American League outfielder of 1968: the former has more than twice the RBIs of the latter, along with a batting average 73 points higher, yet the two performed at identical levels, and an argument could be made that Batter B was superior.

    In a similar comparison involving those two years of extremes, Bill Terry led the National League in 1930 with a BA of .401, a mark surpassed by Ted Williams in 1941 but not equaled since; Carl Yastrzemski led the American League of 1968 with a performance that oldtimers held to be a disgrace, a lowly BA of .301, the worst ever to win a batting championship. Terry’s mark was achieved at a time when most pitchers had only two pitches, a fastball and a curve, and not enough confidence in the latter to throw it when behind in the count at 2-0 or 3-1. The parks were smaller; there was no night ball; the game was segregated racially; and you played 22 games with each team, none farther west of the Mississippi than St. Louis. Moreover, 1930 was the year in which National League officials, attempting to match the popularity of the slugging American League, juiced the ball to such an extent that the entire league batted .312 (if you remove pitcher batting). In other words, the average nonpitcher in the NL of 1930 batted higher than the AL leader in 1968! When Yaz hit .301, pitchers dominated the game and the average American League nonpitcher hit .238. How to compare Terry and Yaz, who played under such different conditions thirty-eight years apart?

    You could view Terry’s .401 in relation to his league’s BA of .312, concluding that Memphis Bill was a better hitter (by BA alone, which despite its previously cited deficiencies remains the most comfortable stat by which to introduce this technique) by 28.5 percent. You could compare Yaz’s .301 to his league’s BA of .238 and conclude that he was a better than average hitter by 26.5 percent. A mere 2 percentage points separate the men-had they both played in the National League of 1983, when the league average was .255, the Terry of 1930 might have hit .328, the Yaz of 1968, .323. (A further refinement of this method would be to delete Terry’s at bats and hits from his league’s, and those of Yastrzemski from his league’s, so that the batters are not in effect compared with themselves. This, however, necessitates the use of at bats and hits rather than simply the averages and does not significantly alter the results.)

    Why do we need relative measures? Basically, for the same reason we need statistics altogether, to compare, to interpret, and to comprehend, but in a more reasonable and accurate manner when the disparity of the data sources makes the use of absolute, unadjusted numbers illogical. If the analysis involves data produced under widely varying conditions, such as a sample including performances 20, 50, or 100 years apart, any comparison will be meaningless without dragging in a series of rather complex historical understandings to modify the analysis-and in a highly subjective, unreliable manner. To compare Terry’s .401 with Yastrzemski’s .301 with no recognition of the context in which these marks were achieved, that is, to infer that Terry was 100 points better than Yaz, is equivalent to comparing Babe Ruth’s salary of $80,000 in 1930 with Pete Rose’s $806,250 of fifty years later and concluding that Rose was $726,250 richer. To understand those figures we must place them within a context which includes such factors as I.R.S. regulations and inflation: we might think to re-express the two salaries in terms of their purchasing power, multiplying each by the Consumer Price Index of its time as expressed in 1967 dollars; doing this would be to compute a “relative salary” for Ruth and Rose, just as we computed a Relative Batting Average for Terry and Yaz. (And just as we discovered there was little difference between the BAs of the latter couple, we would discover there is little difference between the salaries of the former pair.)

    Few are the fans who could cite the context of Ross Barnes’s .429 batting average of 1876, let alone evaluate its ingredients (these include considerations of equipment, schedule, travel, physiology, racial exclusion, daytime games, rules variations, attitudes, and customs). A statistic removed from its historical context can be as deceptive as a quotation pulled out of context. How, then, to compare Barnes’s .429 with, say, Bill Madlock’s league-leading figure of .339 a century later? Should we discount Barnes’s average 10 percent because in his day batters could demand a pitch above the waist or below? Or should we augment it 17 percent because a pitcher could throw eight “balls” before allowing a walk?

    We are confronted with a similar problem in trying to quantify the various differences between home parks; our solution there was to look at the single measure which reflected all the variables-runs-and from that measure we proceeded to devise a formula for Park Factor. Similarly, the many variables that supply the context for Barnes in 1876 supplied an identical context for every other batter in that year-and the context in which Bill Madlock hit .339 prevailed for every other National League batter in 1976 (except for home park, of course). Accordingly, if we form a ratio of Barnes’s .429 to his league’s average (.265) and another of Madlock’s to his league’s average (.263) we obtain figures (1.62 for Barnes, 1.28 for Madlock-stated for convenience in Total Baseball as 162 and 128), which may reasonably be compared with each other: Barnes was 62 percent better than his league in BA, while Madlock was 28 percent better than his; these become the comparables, not the .429 and .339. The method will not become a time machine-putting Barnes on a modern club and Madlock on an old-time one-any more than Park Factor is a place machine, switching Joe DiMaggio to Beantown and Ted Williams to the Bronx. However, the relativist approach offers suggestive truths and does measure precisely the extent to which Barnes’s and Madlock’s BAs dominated those of their contemporaries.

    Until the 1970s, when David Shoebotham (“Relative Batting Averages,” Baseball Research Journal, 1976) and Merritt Clifton (“Relative Baseball,” Samisdat, 1979) introduced the relativist approach, all baseball stats were absolute. And for cross-era comparison, that favorite Hot Stove League activity, absolute stats were absolutely useless, generating plenty of heat and precious little light. What the theory of relativity, baseball-style, does beautifully is to eliminate the need for bringing historical baggage to statistical analysis. The normalized or relative versions of any statistic-batting average, Production, ERA, slugging average, you name it; even homers or strikeouts, though there are problems with these-will be greater than 1.00 for all above-average performers (1.41, for example, means 41 percent better than average in the given category) while relative statistics less than 1.00 will indicate a below average level of play (0.88 means 12 percent below the norm).

    It is as simple as can be. So Early Wynn had a 3.20 ERA in 1950? What does that mean? Well, the league ERA was 4.58, so Wynn did very well indeed. His normalized ERA thus was 143, a mark better than that earned by Tom Seaver in 1968, when he had an absolute ERA a full run lower at 2.20.

    We cannot employ a Relative Won-Lost record, for the league average is every year the same: .500. (A logical corollary is that one cannot fruitfully use relative measures of any sort for a single season’s analysis, as all like figures will be compared to the same league average. The numbers may be changed into normalized form, but the players’ rankings will be unchanged: the top ten in batting average in 1990, for example, will retain their ranks in Relative Batting Average.)

    Relativism in baseball echoes not only Einstein but also Shakespeare, whose words in Hamlet might be modified to read “There is nothing either good or bad, but context makes it so.” No longer must we accept arbitrary assessments of performance or regard with awe such old-time figures as Hugh Duffy’s BA of .438 in 1894 (not the accomplishment that Rod Carew’s .388 was in 1977) or George Sisler’s .407 in 1920 (not as good as Roberto Clemente’s .357 in 1967). Conversely, a “mediocre” performance of recent years, such as Bobby Murcer’s .292 of 1972, for instance, stacks up as the equal of Eddie Collins’ .360 in 1923, while Charlie Grimm’s seemingly solid .298 in 1929 compares unfavorably to Mike Cubbage’s .260 in 1976.

    Relativism redefines our understanding not only of particular accomplishments but also of baseball history itself. We see that the men who batted .400 with numbing regularity in the 1890s and 1920s were not supermen (would you swap Wade Boggs for Tuck Turner? George Brett for Harry Heilmann?) anymore than the sub-2.00 ERA pitchers of the late 1960s (Gary Peters, Bob Bolin, Dave McNally, et al.). Absolute figures lie. Are hitters today worse because none has hit .400 since 1941? Or are they superior because a Dave Kingman can average nearly 30 homers a year while Cap Anson only averaged 4? Are infielders better today because they make fewer errors than their counterparts of 50, 75, or 100 years ago? Do modern outfielders have limp-noodle arms because their assist totals pale before those registered in the early decades of the 1900s? Is baseball improving or declining, and has its rise or fall been steady? One can spit absolute stats on the hot stove all winter long and get no closer to the answer, but with relative statistics, the issues are clarified.

    In the May 1983 issue of The Coffin Corner, the newsletter of the Professional Football Researchers Association, Bob Carroll offered a witty and perceptive dissection of the relative approach to football statistics. It was based upon a comparison of two great running backs, Tuffy Leemans of the New

    York Giants of the late 1930s and early ’40s and George Rogers, then with the New Orleans Saints. “I’ve always liked the story,” Carroll wrote, “of the little old lady who scornfully toured a Picasso exhibit and then sniffed, ‘If Rembrandt were alive today, he wouldn’t paint this way!’ To which a bystander replied, ‘Ah, but if Rembrandt were alive today, he wouldn’t be Rembrandt.'”

    There are things that relative baseball stats won’t do, questions they won’t answer. What would Ty Cobb bat if he were playing today? Lefty O’Doul was asked this question by a fan at an offseason baseball banquet in 1960. “Maybe .340,” O’Doul answered. “Then why do you say Cobb was so great,” the fan remarked, “if he could only hit .340 with the lively ball today?” “Well,” O’Doul said, “you have to take into consideration that the man is now 74 years old.” Relative Batting Average cannot tell with certainty what Cobb would hit today, for as Carroll wrote of Tuffy Leemans, if Cobb were playing today he wouldn’t be the same Cobb; he would be bigger, stronger, and faster, and he might choose to steal less and go for the long ball more.

    Relief Pitching

    Absent from the chapter to this point has been the relief pitcher, a modern specialist who because of his still-evolving role in baseball, presents a variety of sabermetric problems and opportunities. The nature of the job is such that his won-lost record is not meaningful (even less so today than ten or fifteen years ago, with the ace in most bullpens being called upon-in highly dubious wisdom-only when his team has a lead in the eighth or ninth inning). A reliever may pick up a win with as little as a third of an inning’s work, if he is lucky, while a starter must go five innings; a reliever may also pick up a loss more easily, for if he allows a run there may be little or no opportunity for his teammates to get it back, as they can for a starter. Earned run average is meaningful for the reliever, but it must be .15 to .25 lower to equate with that of a starter of comparable ability: a reliever frequently begins his work with a man or two already out, and thus can put men on base and strand them without having to register three outs.

    Ratios of hits to innings, strikeouts to innings, strikeouts to walks-all of these have their interest, but none is sufficient by itself to measure relief-pitcher effectiveness. Relievers may also have an edge in these ratios because they generally face each batter only once in a game, thus leading to fewer hits and more strikeouts per inning. Before discussing the modern alternatives of saves or Relief Points, and our own Relief Ranking, let’s review briefly the rise of the relief pitcher from the role of a mere hanger-on to, some would say, the most indispensable part of a winning team.

    Relief pitching before 1891 was limited, with rare exceptions, to the starting pitcher exchanging places with one of the fielders, who was known as the “change pitcher.” Substitutions from the bench were not permitted except in case of injury until 1889, when a tenth man became entitled to designation as a substitute for all positions; free substitution came in two years later, but no relief specialists emerged until Claude Elliott, Cecil Ferguson, and Otis Crandall in the first decade of this century.

    The next decade’s best relievers were starters doing double duty-notably Ed Walsh, Chief Bender, and Three Finger Brown. The 1920s, and up to the end of World War II, brought the first firemen to be employed in the modern way, although they tended to work more innings and fewer games than today. These were men such as Firpo Marberry, Johnny Murphy, Ace Adams, and several other worthies.

    When you think of a relief pitcher in the modern-day sense-that is, a man who can appear in 50 or more ballgames a year, all or nearly all in relief, and win/save 30 or more-you begin with Joe Page of the 1947-49 Yankees and Jim Konstanty of the 1950 Phils, though Marberry had one such season in 1926. None of the three, however, ever heard of a “save” in his playing days-this term wasn’t introduced until 1960, the year after Larry Sherry’s heroic World Series in which he finished all four Dodger victories, garnering two for himself and saving the others; 1959 was also the year fireman Roy Face went 18-1, not losing until September 11.

    Before Jerry Holtzman of the Chicago Sun Times devised the save, baseball people were looking at really only one figure to measure a reliever’s work, and that was the number of games in which he appeared; any other appreciation of his efforts was expressed impressionistically. A reliever did not work enough innings to qualify for an ERA title (Hoyt Wilhelm in 1952 being the exception), nor could he expect to win 20 games. The introduction of a specialized statistic for the fireman was acknowledgement of his specialized employment and conferred upon it a status it had never enjoyed, not even after the exploits of Konstanty, Page, Wilhelm, and Face. Only when the save came into being did the majority of relievers take pride in their work and stop regarding their time in the bullpen as an extended audition for a starting role.

    When The Sporting News, spurred by Holtzman, began recording saves in its weekly record of the 1960 season, the save was defined in a way different from today. Then, upon entering the game, a reliever had to confront the tying or winning run on base or at the plate, and of course finish the game with the lead. This definition later became eased, so that simply finishing a game would get the reliever a save; a memorably absurd result of the new ruling was that the Mets’ Ron Taylor gained a save in 1969 by pitching the final inning of a 20-6 win over Atlanta. This outraged sportswriters and fans alike, so in 1973 the definition was changed yet again: a reliever had to work three innings or come in with the tying or winning run on base or at bat. This definition was relaxed yet again in 1975 so that the tying run could be on deck, thus giving the relief pitcher license to allow a baserunner. It was a good thing for statisticians when Dan Quisenberry surpassed John Hiller’s 1973 record of 38 saves by a decisive margin of 7. Today, of course, Bobby Thigpen’s 1990 mark of 57 saves seems beyond challenge . . . but back in 1920 so did Babe Ruth’s 29 homers.

    There was a blip in the relievers’ trend of rising importance when the American League introduced the designated hitter in 1973. The predicted outcome, based on the first few years’ experience of the DH, was: increased offensive production, no more need to pinch-hit for the pitcher, and thus a greater number of complete games and fewer saves. All those things did happen in 1973-76, although not quite to the degree expected-and soon the American League’s use of relief pitchers became as extensive as it had been in the early 1970s. In 1982, despite the DH, American League starters completed only 19.6 percent of their games, an all-time league low (though still substantially higher than the National League, where CGs dropped below 15 percent the last few years). In 1990 the AL and NL each logged complete games at about a 16 percent rate.

    Relief Points is an improvement over saves, in all of its various incarnations including the one that provides a penalty for a blown save as well as for a defeat.

    Some folks still long for a measure of middle-relief effectiveness, that statistical no-man’s land. In April 1981 Sports Illustrated came up with an incredibly complicated series of tabulations to address these final injustices, and they were dazzling. However, the SI method dazzled in the same way that the Mills brothers’ Player Win Average did-it was ingenious and well conceived, but involved too much work. Not only did it require play-by-play analysis, but it also reminded one (queasily) of the National Football League’s quarterback-rating system. Quarterbacks are rated in four categories, variously weighted, to arrive at a number of “rating points.” Not one fan in a thousand could tell you how the rating points are derived, and the same holds for the SI relievers’ formulas.

    The final relief statistic to be discussed is the one we think is the best-Relief Ranking, which is a weighted variant of park-adjusted Relief Runs (in the first edition of Total Baseball, we applied the measure to all pitchers who averaged less than three innings per appearance, and this resulted in some needless inclusions of pitchers who were primarily starters; this time around we have broken out all pitchers’ relief innings). Relief Ranking tends to favor closers, while Relief Runs provides a good measure for middle-relief outings.

    The Future

    The most exciting frontier for sabermetrics is in situational stats, the type employed by The Elias Sports Bureau, The Baseball Workshop; and Stats, Inc.; as the years go by and their data bases grow, the sampling sizes of the data will enlarge and their figures for day vs. night, turf vs. grass, and so on, will be statistically meaningful as well as statistically correct. Cross-era comparison remains a subject of intense interest, and the debate over average-player skill rages on. Fielding and relieving, as discussed, also provide fertile ground for invention.

    Fantasy baseball aficionados seem caught up in the competition and deal-making (as well as player evaluation), but some of the newsletters, such as John Benson’s, provide sound analysis and trend-spotting tips. It would not be surprising if Rotisserie-type Leagues, rather than SABR, furnish the best sabermetricians of the 1990s

    betbubbles

    Categories