0

I'm new to R, and I'm playing with strsplit within a data frame. My data frame uses the following:

Student <- c("John Davis","Angela Williams","Bullwinkle Moose","David   Jones",
"Janice Markhammer","Cheryl Cushing","Reuven Ytzrhak","Greg Knox","Joel England",
"Mary Rayburn")
Math <- c(502,600,412,358,495,512,410,625,573,522)
Science <- c(95,99,80,82,75,85,80,95,89,86)
English <- c(25,22,18,15,20,28,15,30,27,18)
student.exam.data <- data.frame(Student,Math,Science,English)

I then attempt to use the following to split the Student using the following:

student.exam.data$Student <- strsplit(student.exam.data$Student, " ", fixed = TRUE)

which produces the following error:

Error in strsplit(student.exam.data$Student, " ", fixed = TRUE) : non-character argument

The only way I've found to split my Student column is to first substitute the space with a period, using student.exam.data <- sub("\\s", ".", student.exam.data$Student) followed by student.exam.data$Student <- strsplit(student.exam.data$Student, ".", fixed = TRUE)

Why does this work this way, and how can I use strsplit on whitespace?

Yehuda
  • 1,787
  • 2
  • 15
  • 49
  • 5
    You have a factor, and you need to pass a character to `strsplit`. Wrap the `x` argument in `as.character` – Rich Scriven Jan 19 '17 at 19:16
  • 2
    (or set `stringsAsFactors = F` when you create the `data.frame`) – Gregor Thomas Jan 19 '17 at 19:20
  • Is there any way to split this outside of the `strsplit` function? I used the `as.character(student.exam.data$Student)`, but `str(student.exam.data$Student)` returns `Factor w/ 10 levels "Angela Wililams",..: 8 1 2 4 6 3 10 5 7 9`, and `strsplit(student.exam.data$Student, " ", fixed = TRUE)` then returns the `Error in strsplit(student.exam.data$Student, " ", fixed = TRUE) : non-character argument` error. It only seems to work as `strsplit(as.character(student.exam.data$Student), " ", fixed = TRUE)`. Do you know why this would be? – Yehuda Jan 19 '17 at 19:30

1 Answers1

2

the error comes from the fact that data.frame coerces your character vector into a factor, which throws an error with strsplit, as said in the documentation.
Either you can do

student.exam.data$Student <-  strsplit(as.character(student.exam.data$Student), " ", fixed = TRUE)

Or

student.exam.data <- data.frame(Student,Math,Science,English, stringsAsFactors = FALSE)
student.exam.data$Student <- strsplit(student.exam.data$Student, " ", fixed = TRUE)
FlorianGD
  • 2,336
  • 1
  • 15
  • 32
  • I'd forgotten about that coercion and the `stringsAsFactors` argument. Thanks--that'd be the more elegant solution. – Yehuda Jan 19 '17 at 20:32