Module #10 Introduction to ANOVA

This week’s assignment was to apply ANOVA to a dataset of patient pain ratings at different levels of stress. The patients rated their pain levels while on the test drug on a scale from 1 to 10 (with 10 being the most pain), and their stress states were recorded as high, moderate, and low stress.

The null hypothesis is that there is no difference between the means of the pain ratings at different stress levels.

To test this, I copied the table of data from the assignment website. Then I cleaned the dataset up a bit using the gather() function from the tidyr package to convert the data to two columns – one with the stress level and one with the pain level. Then I used aov() to get ANOVA information and TukeyHSD() to check variability between specific stress groups.

My code follows.

library("tidyr")
# Read the table and then convert it to two columns detailing stress and pain
migraine <- read.table("G:/week10data.txt", header=TRUE)
migraine_clean <- gather(migraine, stress, pain, factor_key=TRUE)

# Get ANOVA information for the data
migraine_aov <- aov(migraine_clean$pain ~ migraine_clean$stress)

# Print summary information from ANOVA
summary(migraine_aov)
# Print comparisons between groups to determine where variance lies
TukeyHSD(migraine_aov)

The output of the ANOVA information was:

                      Df Sum Sq Mean Sq F value   Pr(>F)
migraine_clean$stress  2  82.11   41.06   21.36 4.08e-05 ***
Residuals             15  28.83    1.92
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The p value is very low (less than 0.05) and the F value is high, so the null hypothesis can be rejected. There is variability in pain between stress groups.

(To be sure that F was high enough to show variability I ran qf() against the DFs of 2 and 15. At a 95% probability level the critical F value would be 3.682, so the F of 21.36 does exceed the critical level.)

To find where there was variance, I used TukeyHSD(). The output of the TukeyHS() function was:

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = migraine_clean$pain ~ migraine_clean$stress)

$`migraine_clean$stress`
                                 diff       lwr        upr     p adj
moderate_stress-high_stress -1.166667 -3.245845  0.9125117 0.3382642
low_stress-high_stress      -5.000000 -7.079178 -2.9208216 0.0000440
low_stress-moderate_stress  -3.833333 -5.912512 -1.7541550 0.0006586

There was not significant variance in the mean pain between moderate and high stress, but there was significant variance in the other comparisons. It therefore appears that pain levels differ significantly between low and medium stress levels, while there is little difference in pain levels between medium and high stress levels.

This entry was posted in Advanced Statistics and Analytics. Bookmark the permalink.

Comments are closed.