A calculator for medical testing

Lee Phillips

April 24^th, 2020

October 12^th, 2021: I updated the wording to refer to infection rather than disease, following a suggestion offered by David Whitcombe.

At the bottom of this page is a calculator that I made to calculate one very specific thing: the probability that you are infected with a pathogen given that a test says that you are infected. Between here and there I’ll make precise what I mean about these things; but, although we I will use the language of disease and medical testing, I hope it’s clear that these ideas apply, and that the calculator can also be used for, any test that gives a yes or no answer—for example, drug testing.

If you are tested for an infection and are informed that the test came out positive, there is one thing you want to know now: are you actually infected? Or, more precisely, (since, as James Clerk Maxwell said, “The true logic of the world is in the calculus of probabilities”) what is the chance, or probability, that you are infected? In the symbolic language of conditional probability, what we would like to know is

\[P(\hbox{infected} | \hbox{positive})\]

or the probability that we are indeed infected, given that we got a positive result back from the test.

To calculate this number, this probability, we need to have some other numbers, and we usually do. First, there is what the doctor tells us: this test is 99% (or whatever it may be) accurate. There are several ways in which a test can be “accurate”; but, if this is all we are told, we can interpret this number as a particular conditional probability: the probability that the test will give a positive result if in fact we are infected.

\[P(\hbox{positive} | \hbox{infected})\]

This is sometimes called the sensitivity of the test.

There is another measure of a test’s accuracy, which is sometimes called the specificity: the probability of a negative result given that you are uninfected: a true negative result:

\[P(\hbox{negative} | \hbox{uninfected})\]

We are often just told one number for the “accuracy” of the test, in which case we have no choice but to use this number for both the sensitivity and the specificity.

Here is where people who have not studied probability theory, including many doctors, make a serious mistake: they believe that the person with a positive result is infected, if not with a probability equal to the accuracy (99%, or whatever it may be claimed to be), then with a great deal of certainty. This error is so common (and can have such tragic consequences) that it has taken its place among the pantheon of the named fallacies, alongside the gambler’s fallacy, the prosecutor’s fallacy, and the many other forms of confused reasoning: it’s called the base rate fallacy, and we are about to see why.

To answer the question that’s on our minds, we need one more number: the base rate, or the rate at which infection occurs in the population of which we are a member. We make the usual inference and interpret this as the a priori probability that we are infected:

\[P(\hbox{infected})\]

Now notice that what we want to know, but don’t yet know, the first conditional probability up above, is the “reverse” of the second conditional probability that expresses the accuracy of the test. And we know that our one trick to relate a conditional probability with its reverse is Bayes’ theorem:

\[P(\hbox{infected} | \hbox{positive}) = \frac{P(\hbox{positive} | \hbox{infected})\cdot P(\hbox{infected})}{P(\hbox{positive})}\]

We know the two factors in the numerator, but we don’t, offhand, know what number to put in for the denominator. However, we can figure that out.

We need \(P(\hbox{positive})\), or the total probability of a positive result. There are two ways to get a positive result, they exhaust all the possibilities, and there is no overlap between them: either you tested positive and you were infected, or you tested positive and you were uninfected. In symbols, and remembering the formula for the probability of the conjunction of two events, we can write

\[P(\hbox{positive}) = P(\hbox{infected AND positive}) + P(\hbox{uninfected AND positive}) = \\ P(\hbox{infected})\cdot P(\hbox{positive} | \hbox{infected}) + P(\hbox{uninfected})\cdot P(\hbox{positive} | \hbox{uninfected})\]

In the above, for \(P(\hbox{uninfected})\) we can put \(1 - P(\hbox{infected})\), because you are either infected or not; and, if we are not given the specificity, we can just substitute \(1 - P(\hbox{positive} | \hbox{infected})\) for \(P(\hbox{positive} | \hbox{uninfected})\). Putting it all together, we have what we need:

\[P(\hbox{infected} | \hbox{positive}) = \] \[\frac{P(\hbox{positive} | \hbox{infected})\cdot P(\hbox{infected})}{P(\hbox{infected})\cdot P(\hbox{positive} | \hbox{infected}) + (1 - P(\hbox{infected}))\cdot P(\hbox{positive} | \hbox{uninfected})}\]

To use the calculator below, just enter the three numbers, and the result will appear immediately. The calculator simply substitutes the numbers you give it into the above formula. I hope it is convenient, as doing the arithmetic by hand is a bit tedious, and error-prone, and upon the correct calculation of this number can depend important decisions.

A calculator for medical testing

Tenuously related: