0

I have a list of character strings where there are repeats in some of the strings. For example:

   [[1]]
   [1] "gr gal gr gal"

   [[2]]
   [1] "gr gal"

   [[3]]
   [1] "gr gal ir ol"

   [[4]]
   [1] "gr gal gr gal"

   [[5]]
   [1] "gr gal"

My desired output is:

   [[1]]
   [1] "gr gal"

   [[2]]
   [1] "gr gal"

   [[3]]
   [1] "gr gal ir ol"

   [[4]]
   [1] "gr gal"

   [[5]]
   [1] "gr gal"

Where the repeats are removed from the string.

My plan is to call strsplit(x, split = " ") and then call the unique function on the split object. If I do it choosing 1 member of the list, my code works fine:

  > strsplit(pathmd1[[76]], split = " ")
  [[1]]
  [1] "gr" "gal" "gr" "gal"

  > splittest <- strsplit(pathmd1[[76]], split = " ")
  > unique(unlist(splittest))
  [1] "gr" "gal"

However, when I use lapply using these functions, an error is thrown

    pathmd2 <- lapply(1:length(pathmd1), function(i) strsplit(pathmd1[[i]], 
               split = " "))
    pathmd <- lapply(1:length(pathmd2), function(i) unique(pathmd2[[i]])

    unexpected symbol
    77: pathmd2 <- lapply(1:length(pathmd1), function(i) 
        strsplit(pathmd1[[i]], split = " ")
    78: pathmd
        ^

Why isn't the function working with lapply?

MeeraWhy
  • 93
  • 6
  • 2
    I think you are getting the error shown because you forgot a ")" at the end of your `strsplit` call in that example – Mike H. Oct 13 '17 at 20:06
  • The missing trailing paren is at the end of the `pathmd <- ...` command, not the `strsplit` command. – r2evans Oct 13 '17 at 20:36
  • strsplit gives a list output, even if passing a single string. Consider recasting your list into a character vector. The output from strsplit will be in the expected format. Then the call is simply: `lapply(strsplit(charVec, ' '), unique)`. – AdamO Oct 13 '17 at 20:38
  • Thanks everyone. All these tips helped! – MeeraWhy Oct 13 '17 at 21:11

1 Answers1

0

You can try:

lapply(f, function(x) unique(unlist(strsplit(x, " "))))
#output
[[1]]
[1] "gr"  "gal"

[[2]]
[1] "gr"  "gal"

[[3]]
[1] "gr"  "gal" "ir"  "ol" 

[[4]]
[1] "gr"  "gal"

[[5]]
[1] "gr"  "gal"

where f is your list.

there is not need to iterate like with a for loop

missuse
  • 19,056
  • 3
  • 25
  • 47