Module #11 Chi Squared Test

This week we were asked to evaluate hotel guest satisfaction data using the Chi Squared test. The data is separated by the guest’s recreational preference (beachcombers and windsurfers). We were to summarize our findings and determine whether there are 4 or more degrees of freedom.

I ran the following code to add the data to R and run the Chi Squared test on it.

library(ggplot2)
library(tidyr)
# Build a data frame containing all responses
again <- c(rep("yes", 163), rep("no", 64), rep("yes", 154), rep("no", 108))
guest_type <- c(rep("beachcomber", 227), rep("windsurfer", 262))
survey <- data.frame(again, guest_type, stringsAsFactors = TRUE)
# Run the Chi Squared Test on the data
c2test <- chisq.test(survey$again, survey$guest_type)
c2test

That gave me the result:

	Pearson's Chi-squared test with Yates' continuity correction

data:  survey$again and survey$guest_type
X-squared = 8.4903, df = 1, p-value = 0.00357

There is 1 degree of freedom, so there are not 4 or more dfs. The p value is less than 0.05, so the null hypothesis is not supported and we can conclude that the variables are dependent (meaning that recreational preference likely influenced how they answered the survey).

To illustrate this finding, I created bar graphs contrasting the observed results with the results that would be expected if the null hypothesis were true. I retrieved those values from the Chi Squared test result variable.

# For the chart, build a new dataframe that includes the observed and expected totals
c2temp <- data.frame(c2test$observed, stringsAsFactors = TRUE)
c2temp$expected <- c(c2test$expected)
colnames(c2temp) <- c("again", "guest_type", "observed", "expected")
# Use gather() to move results into a single column and create a "data_type" column to use for faceting
c2compare <- gather(c2temp, observed, expected, key="data_type", value="responses", factor_key=TRUE)
# Build a faceted bar chart putting the observed and expected data next to each other
ggplot(c2compare, aes(guest_type, responses, fill=again)) +
  facet_wrap(~data_type) +
  geom_bar(stat="identity") +
  xlab("Guest Type") +
  ylab("Responses") +
  labs(fill="Would return", title="Hotel Guest Satisfaction Chi Squared")

The graph is below.

The graph shows that the survey results (on the left) differ from the expected results (on the right).

This entry was posted in Advanced Statistics and Analytics. Bookmark the permalink.

Comments are closed.