How to detect false positive?

Question

When applying statistical hypothesis testing, type I error (false positive) could happen. Often we would not know whether type I error happens. But are there cases otherwise, i.e., we can have the truth later after applying hypothesis testing?

For example, I would like to know if women live longer than men. I set up my hypothesis testing for ages at death under two genders: H0 is equality and H1 is women's death age is larger. Assume the result shows significance - reject null. Also assume later scientific research shows women don't live longer than men, and new data shows insignificance. This would be a type I error, and it's known later after the hypothesis testing.

Where could I find cases like this - type I error is known by other measurements?

Lily · Answer 1 · 2023-03-20T18:16:56.213

1

If you would like an example that involves repeated formal hypothesis testing, this could be one - suppose you are testing whether males earn more more than females, and you draw a random sample from the population and reject the null hypothesis, and conclude that males earn more than females. And then, you use the same population and draw another random sample, but this time you are not able to reject the null. Or, you use a better income measure (say, by including more income sources, or getting official income data from tax agencies rather than self-reported income) on the same random sample as the first time in the second hypothesis test but fail to reject the null. The inconsistent results across the hypothesis tests can be a flag for possible false positives in the first hypothesis test. And the reason for the possible false positive is sampling variation (each random sample from the same population can be different), or measurement error of income, respectively.

I would not suggest detecting false positives by conducting the same hypothesis test but using a sample from a population later in the time to detect false positives from a hypothesis test based on a sample from an earlier population. It could be the case that the underlying population distribution is actually changing over time, and this will contaminate our conclusion.

If you are finding an example of "first perform statistical hypothesis testing and claim positive, and later the ground truth is given", one example would be - first, we are only able to do hypothesis testing on females' and males' longevity based on a random sample we collect from the population, and later on, say, the national health policy department releases the average longevity of females and males that they calculate base on the entire population we draw our random sample from, then we noticed in our sample we reject the null hypothesis, but the null is actually true with the population information released by the officials. So we are able to confidently conclude that our hypothesis testing has a Type 1 error (false positive).

edited Mar 20 '23 at 18:16

answered Mar 17 '23 at 22:59

Lily

56
4

@Sean, hope you can click to accept this answer if you find it helpful! Happy to discuss more as well. – Lily Mar 17 '23 at 22:59
Thanks a lot for the instructions, Lily! I would not look for repeated testing either. My point is it's possible the decision on the positive claim can be proved to be untrue, and thus a type I error happens on the hypothesis testing. I am looking for data/cases for that scenario. We don't have to use repeated testing techniques to find the ground truth. To be short, we first perform statistical hypothesis testing and claim positive, and later the ground truth is given, and we can find it's a false positive. – Sean Mar 18 '23 at 13:38
Repeated testing doesn't satisfy me, and I will wait for more answers. If no better input, I will take yours as the answer. Thanks a lot, though! – Sean Mar 18 '23 at 13:39
Hello @Sean, I see, thanks for the clarification! If you are finding an example of "first perform statistical hypothesis testing and claim positive, and later the ground truth is given", then I would say these are rare cases. The reasons are as follows. Hypothesis testing is a way to use random samples drawn from a population to infer information about population parameters of the underlying population - in the case you mentioned about longevity of females and males, the ground truth will be the actual average lifetime among females and males in the entire population. – Lily Mar 20 '23 at 17:49
And Type 1 error (false positive) happens because there is sampling variation (e.g. each time you draw a random sample from the same population, the sample will be different due to the randomness). This means that the conclusion of population parameter from conducting hypothesis testing on the sample may be wrong due to sampling variation. – Lily Mar 20 '23 at 17:55
So the only way to 100% confirm that the first hypothesis testing based on the sample is false positive is to have the true underlying population parameters and see. One example that fits your requirement could be - – Lily Mar 20 '23 at 18:12
First, we are only able to do hypothesis testing on females' and males' longevity based on a random sample we collect from the population, and then, say, the national health policy department releases the average longevity of females and males that they calculate base on the entire population we draw our random sample from, then we noticed in our sample we reject the null hypothesis, but the null is actually true with the population information released. So we are able to confidently conclude that our hypothesis testing has a Type 1 error (false positive). – Lily Mar 20 '23 at 18:13
Hi @Sean, I also added the example to the answer above. Let me know if you have questions, happy to discuss more :) – Lily Mar 20 '23 at 18:17
Hi @Lily, I appreciate your inputs! They all make sense, but what I am looking for is a real case. The national data you mentioned, are they actually available? A further thinking, even though they are available, they may not help me for type I error, because we know, and the population data should also show, male live shorter than female. I am sorry that my example is a lousy one. I use it to help illustration. This is also why I am here looking for a real good example. But anyway, I enjoy discussing with you about type I error. I appreciate your help! – Sean Mar 21 '23 at 15:24

score 0 · Answer 2 · answered Mar 17 '23 at 06:49

0

One example could be Covid testing, where the null hypothesis is that the individual does not have Covid, and the alternative hypothesis is that the individual has Covid.

When developing Covid test schemes in labs, it is usually the case that we know beforehand whether the individuals have Covid or not (through X-ray or other methods) and assess the probability of Type 1 error of the test by comparing the actual results and the test results.

When applying developed Covid test schemes in practice, we can also detect false positives through repeated sampling/testing of the concerned individuals and see if the test results are consistent throughout. Here is an example (https://medicine.missouri.edu/news/researchers-identify-technique-detect-false-positive-covid-19-results), where individuals who were tested positive went over a quality control protocol for repeat testing to reduce false positives.

answered Mar 17 '23 at 06:49

Lily

56
4

Thanks a lot for sharing this study, Lily! I skimmed the paper. It seems it's about a biological test, but not a statistical one. It doesn't perform statistical testing either. Do I understand it correctly? – Sean Mar 17 '23 at 17:04
Hello @Sean, sure, the concept of false positives is broader than performing repeated formal hypothesis tests on population distributions, and false positives can happen at individual levels too. For example, if a medical test report an individual to have certain disease but the individual does not have it, it is a false positive. If a machine learning algorithm predict a credit card owner to default within the next month but it turned out not to be the case, the prediction will also be a false positive. – Lily Mar 17 '23 at 22:35
So if we are comparing the prediction and the ground truth to detect a false positive at the individual level, we do not need to compute test statistics and do formal hypothesis testing. As for the paper, it is about using another testing measure on a subsample of patients to figure out what is the ground truth, and compare the truth with the previous test results, so there is no need to compute the test statistics to detect false positives. – Lily Mar 17 '23 at 22:42
Thanks a lot for the clarification! I was looking for cases suitable for statistical testing. I am sorry I didn't make it clear enough in my post! – Sean Mar 18 '23 at 13:30

How to detect false positive?

2 Answers2