The Addition Law of Probability, in words

When I was learning the basics of probability for the first time, I found that I needed to justify the Addition Law of Probability for myself beyond the standard Venn diagram illustration of overlapping sets. This felt unusual, because I’m normally a highly visual learner, but for whatever reason I felt that I needed a more explicit demonstration that the law had to be formulated as it was, rather than fitting the illustration simply by happenstance.

The Addition Law concerns the probabilities of outcomes which fall within certain sets of interest. These sets can be thought of as conditions which are met (or not) by elements of the sample space (i.e. potential outcomes). We can think of “event” A as identifying the case that an outcome x satisfies condition A. The Law states that the probability of the union of two events (A ∪ B) is equal to the sum of the individual probabilities of the events minus the probability of the intersection of the events (A ∩ B):

Pr(A ∪ B) = Pr(A) + Pr(B) - Pr(A ∩ B)
In words, this means that the probability of an outcome satisfying at least one of (i.e., either or both of) conditions A and B is equal to the probability that the outcome will satisfy condition A, plus the probability that the outcome will satisfy condition B, minus the probability that the outcome will satisfy both conditions A and B.

Something about this never struck me as intuitive, at least not as presented above. Of course, if the two conditions are mutually exclusive (i.e. the sets do not overlap), Pr(A ∩ B) = 0, so the union probability is simply the sum of the individual condition probabilities—that’s all easy to see.

The explanation for the subtracted term in the general equation states that the individual condition probabilities end up double-counting outcomes which satisfy both, so the intersection probability must be subtracted from the total to correct. But this felt (to me, at least) unnatural—I couldn’t “see” this from the equation. To prove this to myself, I was able to derive the Law while expressing it using words, and make clear why the correction is both necessary and successful.

Beginning with the (much more intuitive) equation for the case of mutually exclusive condition sets, we have:

Pr(A ∪ B) = Pr(A) + Pr(B)
As noted above, this is valid because there are no outcomes satisfying both conditions (the sets do not overlap). If we wish to extend a formal definition of the union probability to a case of non-mutually exclusive conditions, we must modify this equation somehow. We can find the way forward by parsing the condition probabilities.

If some elements of the sample space will satisfy both conditions, then each of the condition probabilities in the equation can be parsed to probabilities of outcomes satisfying only the given condition and those satisfying both the given condition and the second condition. In words, the parsed equation is:

Pr(A ∪ B) ≠ Pr(A only) + Pr(A and B) + Pr(B only) + Pr(B and A)
The doubly satisfied conditions are ordered above to show that they correspond to the first and second terms in the original equation that have been parsed, but there is of course no difference between first stating one met condition or the other. Explicitly:

Pr(A and B) = Pr(B and A)
Thus, we can simplify the parsed equation to:

Pr(A ∪ B) ≠ Pr(A only) + Pr(B only) + 2 Pr(A and B)
As this effectively represents the outcomes as complementary portions of the sample space, grouped by the conditions met, we can see that there are only three types of outcomes: x meets A only, B only, or both. Yet, as written above, the original equation is counting the last group precisely twice, and once too many. The correction to ensure validity in the non-mutually exclusive case is thus to subtract this term once (and note that the equality sign is then returned):

Pr(A ∪ B) = Pr(A only) + Pr(B only) + 2 Pr(A and B) - Pr(A and B)
We move from this equation to the standard statement of the Addition Law by recombining our parsed terms:

Pr(A ∪ B) = Pr(A) + Pr(B) - Pr(A and B)
The final term is defined as the intersection of A and B, so the final form notation of the equation above is:

Pr(A ∪ B) = Pr(A) + Pr(B) - Pr(A ∩ B)