How Can This Be?
Mathematical paradoxes are always interesting, especially when they seem even more than paradoxical...that is, just plain wrong! A prime example is Simpson's Paradox.
Consider these baseball batting averages for two players, broken down by their performance against right-handed and left-handed pitchers:
Notice how Player A overall has a better batting average than Player B (i.e. r > R), yet Player B had better batting averages against both left-handed and right-handed pitchers (i.e. r1 < R1 and r2 < R 2). How can this be?
Rather than spoil the paradox for you, I note that nothing is wrong with either the data or the mathematics. If you need resolution, you can find "it" by searching on-line (and see Tan's article below for a great geometric explanation of this same data).
The paradox was "first" described by mathematician Edward Simpson in a 1951 paper. He was not aware that statisticians Karl Pearsonal etal and Udny Yule had already described the paradox in 1899 and 1903 respectively.
Though the above data represents fictitious data, real-life examples exist. In his book A Mathematician at the Ballpark, Ken Ross illustrates the paradox using the batting averages of Derek Jeter and David Justice during the baseball years 1995 and 1996.
The February 1982 issue of the American Statistician provides multiple real-life examples:
Many other exampels exist: patient survival rates in two different hospitals, voting patterns, misleading bias in college admissions, deceptive class sizes for different teachers, etc.
- The overall subscription rate for the journal American History Illustrated increased from January, 1979 to February, 1979, yet the rate decreased for every category of subscriber.
- The overall federal income tax rate increased from 1974 to 1978 but decreased for every income tax bracket. (Sounds like something we need today!)
- The overall death rate fro tuberculosis in 1910 was greater in Richmond (VA) than in New York City, but was less for both whites and non-whites.
If you want to learn more about Simpson's Paradox and its resolution, I suggest these articles, in addition to the wealth of information available on-line:
- J. Mitchem's "Paradoxes in Averages," Mathematics Teacher, April 1989, 250-253
- T. Knapp's "Instances of Simpson's Paradox," College Math Journal, June 1985, 209-210 (source of above ficticious baseball data)
- A. Tan's "A Geometric Interpretation of Simpson's Paradox," College Math Journal, September 1986, 340-341
- M. Calzada & S. Scariano's "Simpson's Paradox and Matrix Determinants," Mathematics and Computer Education, Fall 2000, 237-244
- Z. Usiskin's "Reader Reflections: Simpson's Paradox," Mathematics Teacher, April 1985, 240