Module #6 Sampling & Confidence Interval Estimation

This week we looked at sampling data and estimates over normal distributions.

A. The first section looked at a record of ice cream purchases during an academic year for each of five housemates (8, 14, 16, 10, 11).

a. First I was to calculate the mean of the population. R code:

pop <- c(8, 14, 16, 10, 11)
mean(pop)

The mean is 11.8.

b. Next I took a random sample of 2 of the values from the population.

sample(pop, 2)

I ended up with (8, 16).

c. Next, the mean and standard deviation of my random sample.

mean(subpop)
sd(subpop)

mean: 12
standard deviation: 5.657

d. For contrast, the same stats for the original population:

popmean <- mean(pop)
popsd <- sd(pop)

mean: 11.8
standard deviation: 3.194

The mean of my sample was pretty close to the original population, but the standard deviation was wildly different.

B. The second section concerned a population with a size of 100 and a proportion of 0.95.

1. To determine whether a sample has a normal distribution, you multiply the sample size n times both p (the proportion) and q (1 – p). The sample has a normal distribution if both n*p and n*q are greater than 5.

To get the values, I used this code:

p <- 0.95
q <- 1 - p
n <- 99
n * p
n * q

n*p was 95 and n*q was 5.

This sample, then, almost has a normal distribution, but not quite. If both n*p and n*q have to be greater than 5, then we don’t reach that threshold with n*q equal to 5.

b. The smallest sample size for which p results in a normal distribution would be 101, which is the size for which n*q is greater than 5 (specifically, 5.05).

This entry was posted in Advanced Statistics and Analytics. Bookmark the permalink.

Comments are closed.