This week we looked at sampling data and estimates over normal distributions.
A. The first section looked at a record of ice cream purchases during an academic year for each of five housemates (8, 14, 16, 10, 11).
a. First I was to calculate the mean of the population. R code:
pop <- c(8, 14, 16, 10, 11) mean(pop)
The mean is 11.8.
b. Next I took a random sample of 2 of the values from the population.
sample(pop, 2)
I ended up with (8, 16).
c. Next, the mean and standard deviation of my random sample.
mean(subpop) sd(subpop)
mean: 12
standard deviation: 5.657
d. For contrast, the same stats for the original population:
popmean <- mean(pop) popsd <- sd(pop)
mean: 11.8
standard deviation: 3.194
The mean of my sample was pretty close to the original population, but the standard deviation was wildly different.
B. The second section concerned a population with a size of 100 and a proportion of 0.95.
1. To determine whether a sample has a normal distribution, you multiply the sample size n times both p (the proportion) and q (1 – p). The sample has a normal distribution if both n*p and n*q are greater than 5.
To get the values, I used this code:
p <- 0.95 q <- 1 - p n <- 99 n * p n * q
n*p was 95 and n*q was 5.
This sample, then, almost has a normal distribution, but not quite. If both n*p and n*q have to be greater than 5, then we don’t reach that threshold with n*q equal to 5.
b. The smallest sample size for which p results in a normal distribution would be 101, which is the size for which n*q is greater than 5 (specifically, 5.05).