Module 3 – Data Frames
The assignment was to take a set of data and perform operations on it as a data frame, per an example document.
The variables with the initial data were:
Name <- c("Jeb", "Donald", "Ted", "Marco", "Carly", "Hillary", "Berine") ABC <- c(4, 62, 51, 21, 2, 14, 15) CBS <- c(12, 75, 43, 19, 1, 21, 19)
I tried turning that into a matrix…
candidates.m <- cbind(Name, ABC, CBS)
…and got a matrix full of strings.
Name ABC CBS [1,] "Jeb" "4" "12" [2,] "Donald" "62" "75" [3,] "Ted" "51" "43" [4,] "Marco" "21" "19" [5,] "Carly" "2" "1" [6,] "Hillary" "14" "21" [7,] "Berine" "15" "19"
So I made a data frame from the data.
candidates.df <- data.frame(Name, ABC, CBS)
The resulting data frame:
Name ABC CBS 1 Jeb 4 12 2 Donald 62 75 3 Ted 51 43 4 Marco 21 19 5 Carly 2 1 6 Hillary 14 21 7 Berine 15 19
I tried to run the mean
command on the data frame, but it didn’t work out in RStudio the way it did in the example. The example showed an error for the text column and then means for the numeric columns, but all I got was the error.
> mean(candidates.df) [1] NA Warning message: In mean.default(candidates.df) : argument is not numeric or logical: returning NA
I got the same error when specifying the two numeric columns.
mean(candidates.df[,2:3])
But I could get the mean of a specific column.
> mean(candidates.df[,2]) [1] 24.14286
To get the means of the columns I had to use colMeans
on the numeric columns specifically – using colMeans
on the full dataframe resulted in an error that “‘x’ must be numeric”.
colMeans(candidates.df[2:3])
Result:
ABC CBS 24.14286 27.14286
I could also use rowMeans
to get means by row.
> rowMeans(candidates.df[2:3]) [1] 8.0 68.5 47.0 20.0 1.5 17.5 17.0
Using as.matrix(candidates.df)
gave the same matrix as I listed at the top of this post.
Data frames are interesting as a means of storing mixed data in a matrix-like structure. It has its caveats when the columns have different modes, but those can be worked around with the right code.