Bulletin

Today

By the end of today you will

library(tidyverse)

What is the Central Limit Theorem?

The central limit theorem is a statement about the distribution of the sample mean, \(\bar{x}\)

The central limit theorem guarantees that, when certain criteria are satisfied, the sample mean (\(\bar{x}\)) is normally distributed.

Specifically,

If

  1. Observations in the sample are independent. Two rules of thumb to check this:

    • completely random sampling
    • if sampling without replacement, sample should be less than 10% of the population size

and

  1. The sample is large enough. The required size varies in different contexts, but some good rules of thumb are:

    • if the population itself is normal, sample size does not matter.
    • if numerical require, >30 observations
    • if binary outcome, at least 10 successes and 10 failures.

then

\(\bar{x} \sim N(\mu, \sigma / \sqrt{n})\)

i.e. \(\bar{x}\) is normally distributed (unimodal and symmetric with bell shape) with mean \(\mu\) and standard deviation \(\sigma / \sqrt{n}\).

Note the standard deviation depends on the number of samples, \(n\).

Practice using CLT & Normal distribution

Suppose the bone density for 65-year-old women is normally distributed with mean \(809 mg/cm^3\) and standard deviation of \(140 mg/cm^3\).

Let \(x\) be the bone density of 65-year-old women. We can write this distribution of \(x\) in mathematical notation as

\[x \sim N(809, 140)\]

Visualize the population distribution

ggplot(data = data.frame(x = c(809 - 140*3, 809 + 140*3)), aes(x = x)) +
  stat_function(fun = dnorm, args = list(mean = 809, sd = 140),
                color = "black") +
  stat_function(fun = dnorm, args = list(mean = 809, sd = 140/sqrt(10)),
                color = "red",lty = 2) + theme_bw() +
  labs(title = "Black solid line = population dist., Red dotted line = sampling dist.")

Exercise 1

Before typing any code, based on what you know about the normal distribution, what do you expect the median bone density to be?

What bone densities correspond to \(Q_1\) (25th percentile), \(Q_2\) (50th percentile), and \(Q_3\) (the 75th percentile) of this distribution? Use the qnorm() function to calculate these values.

Exercise 2

The densities of three woods are below:

  • Plywood: 540 mg/cubic centimeter

  • Pine: 600 mg/cubic centimeter

  • Mahogany: 710 mg/cubic centimeter

  • What is the probability that a randomly selected 65-year-old woman has bones less dense than Pine?

  • Would you be surprised if a randomly selected 65-year-old woman had bone density less than Mahogany? What if she had bone density less than Plywood? Use the respective probabilities to support your response.

Exercise 3

Suppose you want to analyze the mean bone density for a group of 10 randomly selected 65-year-old women.

  • Are the conditions met use the Central Limit Theorem to define the distribution of \(\bar{x}\), the mean density of 10 randomly selected 65-year-old women?

    • Independence?
    • Sample size/distribution?
  • What is the shape, center, and spread of the distribution of \(\bar{x}\), the mean bone density for a group of 10 randomly selected 65-year-old women?

  • Write the distribution of \(\bar{x}\) using mathematical notation.

Exercise 4

  • What is the probability that the mean bone density for the group of 10 randomly-selected 65-year-old women is less dense than Pine?

  • Would you be surprised if a group of 10 randomly-selected 65-year old women had a mean bone density less than Mahogany? What the group had a mean bone density less than Plywood? Use the respective probabilities to support your response.

Exercise 5

Explain how your answers differ in Exercises 2 and 4.