A calculator for medical testing

April 24^{th}, 2020

At the bottom of this page is a calculator that I made to calculate one very specific thing: the probability that you have a disease given that a test says that you have the disease. Between here and there I’ll make precise what I mean about these things; but, although we I will use the language of disease and medical testing, I hope it’s clear that these ideas apply, and that the calculator can also be used for, any test that gives a yes or no answer—for example, drug testing.

If you are tested for a disease and are informed that the test came out positive, there is one thing you want to know now: do you actually have the disease. Or, more precisely, (since, as James Clerk Maxwell said, “The true logic of the world is in the calculus of probabilities”) what is the *chance*, or probability, that you have the disease? In the symbolic language of conditional probability, what we would like to know is

\[P(\hbox{sick} | \hbox{positive})\]

or the probability that we are indeed sick, given that we got a positive result back from the test.

To calculate this number, this probability, we need to have some other numbers, and we usually do. First, there is what the doctor tells us: this test is 99% (or whatever it may be) accurate. There are several ways in which a test can be “accurate”; but, if this is all we are told, we can interpret this number as a particular conditional probability: the probability that the test will give a positive result if in fact we do have the disease:

\[P(\hbox{positive} | \hbox{sick})\]

This is sometimes called the *sensitivity* of the test.

There is another measure of a test’s accuracy, which is sometimes called the *specificity*: the probability of a negative result given that you are healthy: a true negative result:

\[P(\hbox{negative} | \hbox{healthy})\]

We are often just told one number for the “accuracy” of the test, in which case we have no choice but to use this number for both the sensitivity and the specificity.

Here is where people who have not studied probability theory, including many doctors, make a serious mistake: they believe that the person with a positive result has the disease, if not with a probability equal to the accuracy (99%, or whatever it may be claimed to be), then with a great deal of certainty. This error is so common (and can have such tragic consequences) that it has taken its place among the pantheon of the named fallacies, alongside the gambler’s fallacy, the prosecutor’s fallacy, and the many other forms of confused reasoning: it’s called the *base rate fallacy*, and we are about to see why.

To answer the question that’s on our minds, we need one more number: the *base rate*, or the rate at which the disease occurs in the population of which we are a member. We make the usual inference and interpret this as the *a priori* probability that we have the disease:

\[P(\hbox{sick})\]

Now notice that what we want to know, but don’t yet know, the first conditional probability up above, is the “reverse” of the second conditional probability that expresses the accuracy of the test. And we know that our one trick to relate a conditional probability with its reverse is Bayes’ theorem:

\[P(\hbox{sick} | \hbox{positive}) = \frac{P(\hbox{positive} | \hbox{sick})\cdot P(\hbox{sick})}{P(\hbox{positive})}\]

We know the two factors in the numerator, but we don’t, offhand, know what number to put in for the denominator. However, we can figure that out.

We need \(P(\hbox{positive})\), or the total probability of a positive result. There are two ways to get a positive result, they exhaust all the possibilities, and there is no overlap between them: either you tested positive and you were sick, or you tested positive and you were healthy. In symbols, and remembering the formula for the probability of the conjunction of two events, we can write

\[P(\hbox{positive}) = P(\hbox{sick AND positive}) + P(\hbox{healthy AND positive}) = \\ P(\hbox{sick})\cdot P(\hbox{positive} | \hbox{sick}) + P(\hbox{healthy})\cdot P(\hbox{positive} | \hbox{healthy})\]

In the above, for \(P(\hbox{healthy})\) we can put \(1 - P(\hbox{sick})\), because you are either sick or healthy; and, if we are not given the specificity, we can just substitute \(1 - P(\hbox{positive} | \hbox{sick})\) for \(P(\hbox{positive} | \hbox{healthy})\). Putting it all together, we have what we need:

\[P(\hbox{sick} | \hbox{positive}) = \frac{P(\hbox{positive} | \hbox{sick})\cdot P(\hbox{sick})}{P(\hbox{sick})\cdot P(\hbox{positive} | \hbox{sick}) + (1 - P(\hbox{sick}))\cdot P(\hbox{positive} | \hbox{healthy})}\]

To use the calculator below, just enter the three numbers, and the result will appear immediately. The calculator simply substitutes the numbers you give it into the above formula. I hope it is convenient, as doing the arithmetic by hand is a bit tedious, and error-prone, and upon the correct calculation of this number can depend important decisions.

lee-phillips.org is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to amazon.com.