I have a dataframe with multiple character variables of different lengths, and I would like to convert each variable to a list, with each element containing each word, split by spaces.
Say my data looks like this:
char <- c("This is a string of text", "So is this")
char2 <- c("Text is pretty sweet", "Bet you wish you had text like this")
df <- data.frame(char, char2)
# Convert factors to character
df <- lapply(df, as.character)
> df
$char
[1] "This is a string of text" "So is this"
$char2
[1] "Text is pretty sweet" "Bet you wish you had text like this"
Now I can use strsplit() to split each column individually by word:
df <- transform(df, "char" = strsplit(df[, "char"], " "))
> df$char
[[1]]
[1] "This" "is" "a" "string" "of" "text"
[[2]]
[1] "So" "is" "this"
What I would like to do is create a loop or function which would allow me to do this for both columns at once, something like:
for (i in colnames(df) {
df <- transform(df, i = strsplit(df[, i], " "))
}
This, however, produces the error:
Error in data.frame(list(char = c("This is a string of text", "So is this", :
arguments imply differing number of rows: 6, 8
I have also tried:
splitter <- function(colname) {
df <- transform(df, colname = strsplit(df[, colname], " "))
}
splitter(colnames(df))
Which tells me:
Error in strsplit(df[, colname], " ") : non-character argument
I am confused as to why the call to transform works for an individual column but does not when applied within a loop or function. Any help would be much appreciated!