0

I have a dataframe with multiple character variables of different lengths, and I would like to convert each variable to a list, with each element containing each word, split by spaces.

Say my data looks like this:

char <- c("This is a string of text", "So is this")
char2 <- c("Text is pretty sweet", "Bet you wish you had text like this")

df <- data.frame(char, char2)

# Convert factors to character
df <- lapply(df, as.character)

> df
$char
[1] "This is a string of text" "So is this"              

$char2
[1] "Text is pretty sweet"                "Bet you wish you had text like this"

Now I can use strsplit() to split each column individually by word:

df <- transform(df, "char" = strsplit(df[, "char"], " "))
> df$char
[[1]]
[1] "This"   "is"     "a"      "string" "of"     "text"  

[[2]]
[1] "So"   "is"   "this"

What I would like to do is create a loop or function which would allow me to do this for both columns at once, something like:

for (i in colnames(df) {
    df <- transform(df, i = strsplit(df[, i], " "))
}

This, however, produces the error:

Error in data.frame(list(char = c("This is a string of text", "So is this",  : 
  arguments imply differing number of rows: 6, 8 

I have also tried:

splitter <- function(colname) {
    df <- transform(df, colname = strsplit(df[, colname], " "))
}

splitter(colnames(df))

Which tells me:

Error in strsplit(df[, colname], " ") : non-character argument

I am confused as to why the call to transform works for an individual column but does not when applied within a loop or function. Any help would be much appreciated!

  • 1
    It's not clear what you trying to do here. In order to keep your strings as strings, simply do `df <- data.frame(char, char2, stringsAsFactors = FALSE)`. More over, do you realize that `lapply(df, as.character)` returns a list rather a data frame? `transform` works on data frames, not on lists. Finally, what is your desired result? You want a `data.frame` a `list`? This question is very confusing. – David Arenburg Apr 23 '15 at 19:43

1 Answers1

0

I got the desired output without transform

char <- c("This is a string of text", "So is this")
char2 <- c("Text is pretty sweet", "Bet you wish you had text like this")
df <- data.frame(char, char2)
# Convert factors to character
df <- lapply(df, as.character)

I put in

lapply(df, strsplit, split= " ")

To get

$char
$char[[1]]
[1] "This"   "is"     "a"      "string" "of"     "text"  

$char[[2]]
[1] "So"   "is"   "this"


$char2
$char2[[1]]
[1] "Text"   "is"     "pretty" "sweet" 

$char2[[2]]
[1] "Bet"  "you"  "wish" "you"  "had"  "text" "like" "this"

And as Alex mentioned: the first lapply from your code df <- lapply(df, as.character) can be eliminated by changing df <- data.frame(char, char2) to df <- data.frame(char, char2, stringsAsFactors=FALSE)

Pierre L
  • 28,203
  • 6
  • 47
  • 69