Module 3 – Data Frames

The assignment was to take a set of data and perform operations on it as a data frame, per an example document.

The variables with the initial data were:

Name <- c("Jeb", "Donald", "Ted", "Marco", "Carly", "Hillary", "Berine")
ABC <- c(4, 62, 51, 21, 2, 14, 15)
CBS <- c(12, 75, 43, 19, 1, 21, 19)

I tried turning that into a matrix…

candidates.m <- cbind(Name, ABC, CBS)

…and got a matrix full of strings.

     Name      ABC  CBS
[1,] "Jeb"     "4"  "12"
[2,] "Donald"  "62" "75"
[3,] "Ted"     "51" "43"
[4,] "Marco"   "21" "19"
[5,] "Carly"   "2"  "1"
[6,] "Hillary" "14" "21"
[7,] "Berine"  "15" "19"

So I made a data frame from the data.

candidates.df <- data.frame(Name, ABC, CBS)

The resulting data frame:

     Name ABC CBS
1     Jeb   4  12
2  Donald  62  75
3     Ted  51  43
4   Marco  21  19
5   Carly   2   1
6 Hillary  14  21
7  Berine  15  19

I tried to run the mean command on the data frame, but it didn’t work out in RStudio the way it did in the example. The example showed an error for the text column and then means for the numeric columns, but all I got was the error.

> mean(candidates.df)
[1] NA
Warning message:
In mean.default(candidates.df) :
  argument is not numeric or logical: returning NA

I got the same error when specifying the two numeric columns.

mean(candidates.df[,2:3])

But I could get the mean of a specific column.

> mean(candidates.df[,2])
[1] 24.14286

To get the means of the columns I had to use colMeans on the numeric columns specifically – using colMeans on the full dataframe resulted in an error that “‘x’ must be numeric”.

colMeans(candidates.df[2:3])

Result:

     ABC      CBS
24.14286 27.14286

I could also use rowMeans to get means by row.

> rowMeans(candidates.df[2:3])
[1]  8.0 68.5 47.0 20.0  1.5 17.5 17.0

Using as.matrix(candidates.df) gave the same matrix as I listed at the top of this post.

Data frames are interesting as a means of storing mixed data in a matrix-like structure. It has its caveats when the columns have different modes, but those can be worked around with the right code.

This entry was posted in R Programming. Bookmark the permalink.

Comments are closed.