|
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
Bayes Law in plain EnglishDavid Dufty Bayes' Law is a mathematical equation. Let's get it out of the way right now. The formula is Pr(A|B) = Pr(B|A)*Pr(A) / Pr(b) Where Pr is probability, A is an event, B is another event, and A|B means "event A, if event B happens (or is true)" That formula might not mean very much to you, but as we explore it, you'll see that it has very interesting properties. Sometimes, in mathematics, words have slightly different meanings than they do in everyday life. In this situation, an event doesn't necessarily mean an "event" as in a "historical event," it can be a situation or a set of circumstances. You'll get more of a feel for that over time. So in plain English, Bayes' law is a formula for calculating the probability that something (called "A") is true or will be true, given a certain set of circumstances (called "B") The left hand side of the equation is Pr(A|B) This translates into English as "the probability of A, if B is true". Other more precise wording is "the probability of A given that B is true." This is called a conditional probability. After the equals sign, the right hand side of the equation tells us how to find that conditional probability. We simply need to know the numbers for three things:
This is all very abstract. Let's look at a concrete example, Let's look at a situation where someone might want to know a conditional probability. Doctors deal with this situation all the time. A patient walks in who has a fever and chills. The doctor wonders, "what is the chance that this patient has tuberculosis given the symptoms I am seeing?" That's a conditional probability. What's the chance of A given B A = TB, B= fever & chills. Well, he needs to know the following: P (B|A) probability of fever and chills, given TB. It turns out that half of all TB sufferers exhibit this symptom at any point in time. P (A) the probability of TB. TB is a rare disease, affecting .01 percent of Americans (probability is .0001). P (B) fever is a common symptom, generated by over 600 diseases. Perhaps as many as 3 percent of Americans have fever in any given year. (p = .03) Therefore, the conditional probability of TB = .5 times .0001 divided by .03 I calculated this out and got the answer .00016. therefore, there is a 1.6 in ten thousand chance that the person has TB, if they have fever and chills. Here's another situation, where a doctor gets a test back from the pathology lab. It's for a patient who had a blood test for lupus. The test came back positive. This is a concern, but the doctor doesn't jump to any conclusions, because the test can sometimes be wrong. Specifically, the test comes out positive on 99 percent of lupus cases. And it gives a false positive 2 percent of the time. The doctor's problem is again a conditional probability, but instead of symptoms, it's a test result. "What's the chance that the patient has the disease, given this positive test result? So you need to know the chance of a positive test result, given the disease (that's .99, because if the disease is there, the test works 99% of the time) You need to know the probability of the disease (it's in 1/2 a percent of the population, so that's a probability of .005) The probability of a positive result in general (2% false positive rate, so that's .02) Probability of lupus = .99*.005/.02 = .25 So there is a 25 percent chance that the patient has lupus. This confuses a lot of people. Doesn't the test have 99% accuracy? Well yes, but lupus is very, very rare, and the test does a lot of false positives. If you did it on a random selection of 100 people, probably nobody in the group would have lupus, but your test would find two cases of lupus! In other words, even though the test is quite accurate, there are more false positives than there are actual people with lupus. The reason that Bayes' law is so important is that we, humans, are constantly trying to calculate conditional probabilities, but our brains find Bayes' law very difficult to do. Here are some situations where Bayes' law would be useful: What's the probability of a thunderstorm given clouds? What's the chance of getting audited by the IRS if I make a mistake on my tax? Etc. any "what's the chance of that if...?" question is something that Baye's Law applies to. Here's an interesting one. What's the chance that the defendant is guilty, given a DNA match? Well, the chance of a DNA match is, let's say, one in a thousand. In probability-talk, that's .001. But this doesn't just translate into a guilty or innocent thing. What's the chance of a DNA match when the client is guilty P(B|A) versus a DNA match against a random member of the public P(B). What's the base probability of guilt? P(A)? Without these things, a number like that is meaningless. In other words, we need to know the false positive rate as well as the miss rate. A well-known case of mistaken reasoning by prosecutors - and a jury- is the tragic tale of Sally Clark, who was convicted of killing her two children, on the grounds that the chances of them both dying by Sudden Infant Death Syndrome is very small. She was eventually acquitted. One of the things that makes Bayes' law so hard to understand is that any kind of test has two error rates: the miss rate and the false alarm rate. This can be represented in a table as follows (I'll use the disease example again) Test says "disease" Test says "no disease"
For a tornado siren, sometimes the tornado will go off when there is no siren. In that case, the table looks like this
Imagine you can set the sirens to different sensitivity: anything from "hair-trigger" sensitivity (it will go off if there is a strong gust of wind) to completely insensitive (the tower that the siren is mounted on actually has to get ripped out of the ground before the siren will sound). Obviously, a more sensitive siren will yield more false alarms, but few misses. A less sensitive siren will yield more misses but fewer false alarms. For politicians and administrators, they'd rather have a tornado that's too sensitive (and has many false alarms) than one that is not sensitive enough (and has many misses). The bias of the measurement towards false alarms or misses depends on the task. For tornados and medical tests, more false alarms are a small price to pay in comparison to a miss. © Copyright 2007 David Dufty
|
||||||||||||||||||||||||||