Simpson's Paradox

Kapitoly: The Three Door Problem, The Probabilistic Liar's Paradox, The question mark paradox, Simpson's Paradox, The medical paradox, The St. Petersburg Paradox, Non-transitive cubes

Simpson's paradox is a statistical paradox named after the British statistician. The paradox is that if we have two subjects, and one is more successful than the other in all observations, it may be that the other is more successful in the overall sum.

Example

We have two different students at two different schools studying two different majors. Let's call them John and Martin. They both write two tests per semester in their subject. Jana has a pass rate of 30 % on the first and 100 % on the second. Martin has a pass rate of 25 % on the first and 75 % on the second.

Jana seems to be the more successful student. However, if we add the number of questions answered correctly, it may no longer appear so. The crux of the problem is that Jana and Martin took different tests because they went to different schools.

In fact, Jana could answer 3 of 10 questions (30% success rate) and then 2 of 2 questions (100%) correctly in the first test. In total, she answered 5 of the 12 questions correctly. Martin could answer 1 correctly from 4 (25%) and then 6 from 8 questions (75%). In total, he thus answered 7 of the 12 questions. From this point of view, Martin is again more successful.

Simpson's paradox is quite common and there is nothing incomprehensible about it. It is named after Edward H. Simpson, who was the first to describe the phenomenon properly - there were of course first occurrences of this paradox before.