Bulletin

Today

By the end of today you will

library(tidyverse)
library(knitr)

Definitions

Let A and B be events.

  • Marginal probability: The probability an event occurs regardless of values of the other event
    • P(A)
    • Example: What’s the probability a student in STA199 favors dogs?
  • Joint probability: The probability two or more events simultaneously occur
    • Example: What’s the probability a student is a junior and favors dogs?
    • P(A and B)
  • Conditional probability: The probability an event occurs given the other has occurred
    • P(A|B) or P(B|A)
    • Eample: What is the probability a student is a junior given they favor dogs?
  • Independent events: Knowing one event has occurred does not lead to any change in the probability we assign to another event.
    • P(A|B) = P(A) or P(B|A) = P(B)
    • Example: P(Junior | dogs) = P(junior)

Bayes’ Theorem

The global coronavirus pandemic illustrates the need for accurate testing of COVID-19, as its extreme infectivity poses a significant public health threat. Due to the time-sensitive nature of the situation, the FDA enacted emergency authorization of a number of serological tests for COVID-19 in 2020. Full details of these tests may be found on its website here.

We will define the following events:

  • Pos: The event the Alinity test returns positive.
  • Neg: The event the Alinity test returns negative.
  • Covid: The event a person has COVID
  • No Covid: The event a person does not have COVID

The Abbott Alinity test has an estimated sensitivity of 100%, P(Pos | Covid) = 1, and specificity of 99%, P(Neg | No Covid) = 0.99.

Suppose the prevalence of COVID-19 in the general population is about 2%, P(Covid) = 0.02.

Exercise 1: Use the Hypothetical 10,000 to calculate the probability a person has COVID given they get a positive test result, i.e. P(Covid | Pos).

Covid No Covid Total
Pos
Neg
Total 10000

Exercise 2 Use Bayes’ Theorem to calculate P(Covid|Pos).

Simpson’s paradox

This example comes from Confounding and Simpson’s paradox1 by Julious and Mullee.

The data examines 901 individuals with diabetes and includes the following variables

  • insulin_dep: whether or not the patient has insulin dependent or non-insulin dependent diabetes
  • less_than_40: whether or not the individual is less than 40 years old
  • survival: whether or not the individual survived the length of the study
diabetes = read_csv("data/diabetes.csv")

One might be interested in the mortality associated with each type of diabetes.

# code here

Is the aggregate reported above misleading and if so, why?

# code here

  1. Julious, S A, and M A Mullee. “Confounding and Simpson’s paradox.” BMJ (Clinical research ed.) vol. 309,6967 (1994): 1480-1. doi:10.1136/bmj.309.6967.1480↩︎