I am trying to process some character strings for an input file. First I convert the strings from a vector to a list, then I reduce to only unique values.
Next I would like to convert the words in each list element into a string with a separator of ':1 '.
I can get the function to work on a single list element but when I try to use ldply
from plyr
to do it for the whole list, I only get the last word in each list element.
Here's the code:
library(plyr)
df1 <- data.frame(id = seq(1,5,1), string1 = NA)
head(df1)
df1$string1[1] <- "This string is a string."
df1$string1[2] <- "This string is a slightly longer string."
df1$string1[3] <- "This string is an even longer string."
df1$string1[4] <- "This string is a slightly shorter string."
df1$string1[5] <- "This string is the longest string of all the other strings."
df1$string1 <- tolower(as.character(df1$string1))
df1$string1 <- gsub('[[:punct:]]',' ',df1$string1)
df1$string1 <- gsub('[[:digit:]]',' ',df1$string1)
df1$string1 <- gsub("\\s+"," ",df1$string1)
fdList1 <- strsplit(df1$string1, " ", df1$string1)
fdList2 <- lapply(fdList1, unique)
toString1 <- function(x){
string2 <- c()
#print(length(x[1][1]))
#print(x)
#print(class(x))
for(i in length(x)){
string2 <- paste0(string2, x[[i]], ":1 ", collapse="")
}
string2
}
df2 <- ldply(fdList2, toString1)
df2
v1 <- toString1(fdList2[2])
v1
df2
is wrong, I would like a vector similar to v1
for each list element.
Any suggestions?