Pythagorean expectation is a formula invented by Bill James to estimate how many games a baseball team "should" have won based on the number of runs they scored and allowed. Comparing a team's actual and Pythagorean winning percentage can be used to evaluate how lucky that team was (by examining the variation between the two winning percentages). The name comes from the formula's resemblance to the Pythagorean theorem.
The basic formula is:
where Win Ratio is the winning ratio generated by the formula. The expected number of wins would be the expected winning ratio multiplied by the number of games played.
Empirically, this formula correlates fairly well with how baseball teams actually perform. However, statisticians since the invention of this formula found it to have a fairly routine error, generally about three games off. For example, in 2002, the New York Yankees scored 897 runs and allowed 697 runs. According to James' original formula, the Yankees should have won 62.35% of their games.
Based on a 162-game season, the Yankees should have won 101.01 games. The 2002 Yankees actually went 103–58.
In efforts to fix this error, statisticians have performed numerous searches to find the ideal exponent.
If using a single-number exponent, 1.83 is the most accurate, and the one used by baseball-reference.com. The updated formula therefore reads as follows:
The most widely known is the Pythagenport formula developed by Clay Davenport of Baseball Prospectus:
He concluded that the exponent should be calculated from a given team based on the team's runs scored (R), runs allowed (RA), and games (G). By not reducing the exponent to a single number for teams in any season, Davenport was able to report a 3.9911 root-mean-square error as opposed to a 4.126 root-mean-square error for an exponent of 2.
Less well known but equally (if not more) effective is the Pythagenpat formula, developed by David Smyth.
Davenport expressed his support for this formula, saying:
After further review, I (Clay) have come to the conclusion that the so-called Smyth/Patriot method, aka Pythagenpat, is a better fit. In that, X = ((rs + ra)/g)0.285, although there is some wiggle room for disagreement in the exponent. Anyway, that equation is simpler, more elegant, and gets the better answer over a wider range of runs scored than Pythagenport, including the mandatory value of 1 at 1 rpg.