By the end of today you will
library(tidyverse)
library(knitr)
Let A and B be events.
The global coronavirus pandemic illustrates the need for accurate testing of COVID-19, as its extreme infectivity poses a significant public health threat. Due to the time-sensitive nature of the situation, the FDA enacted emergency authorization of a number of serological tests for COVID-19 in 2020. Full details of these tests may be found on its website here.
We will define the following events:
The Abbott Alinity test has an estimated sensitivity of 100%, P(Pos | Covid) = 1, and specificity of 99%, P(Neg | No Covid) = 0.99.
Suppose the prevalence of COVID-19 in the general population is about 2%, P(Covid) = 0.02.
Exercise 1: Use the Hypothetical 10,000 to calculate the probability a person has COVID given they get a positive test result, i.e. P(Covid | Pos).
Covid | No Covid | Total | |
---|---|---|---|
Pos | |||
Neg | |||
Total | 10000 |
Exercise 2 Use Bayes’ Theorem to calculate P(Covid|Pos).
This example comes from Confounding and Simpson’s paradox1 by Julious and Mullee.
The data examines 901 individuals with diabetes and includes the following variables
insulin_dep
: whether or not the patient has insulin dependent or non-insulin dependent diabetesless_than_40
: whether or not the individual is less than 40 years oldsurvival
: whether or not the individual survived the length of the studydiabetes = read_csv("data/diabetes.csv")
One might be interested in the mortality associated with each type of diabetes.
# code here
Is the aggregate reported above misleading and if so, why?
# code here
Julious, S A, and M A Mullee. “Confounding and Simpson’s paradox.” BMJ (Clinical research ed.) vol. 309,6967 (1994): 1480-1. doi:10.1136/bmj.309.6967.1480↩︎