String transformation in R | Grouping words of a string

Question

I want to group the words of string(given below)

text="Lorem,ipsum,dolor,sit,amet,consectetuer"

like this

textNew="Lorem ipsum,ipsum dolor,dolor sit,sit amet,amet consectetuer"

Thanks.

Welcome to [so]. Questions here should __show research effort or attempts__. Please take a __[tour]__. — Unihedron, Aug 22 '14 at 08:14

score 5 · Accepted Answer · answered Aug 22 '14 at 08:10

5

Through gsub function,

> text="Lorem,ipsum,dolor,sit,amet,consectetuer"
> f <- gsub(",([^,]*)", " \\1,\\1", text, perl=TRUE)
> result <- gsub(",[^,]*$", "", f, perl=TRUE)
> result
[1] "Lorem ipsum,ipsum dolor,dolor sit,sit amet,amet consectetuer"

answered Aug 22 '14 at 08:10

Avinash Raj

172,303
28
230
274

1

Which is faster? the one using gsub or lapply ? coz i need to run this on a large set text files – sidpat Aug 22 '14 at 08:49
@510947, I didn't benchmark, but I'm guessing the `gsub` here will be faster than the others. – talat Aug 22 '14 at 08:56

score 4 · Answer 2 · answered Aug 22 '14 at 07:54

4

Here's one option:

x <- strsplit(text, ",")[[1]]
paste0(sapply(1:(length(x)-1), function(z) paste(x[c(z, z+1)], collapse = " ")), collapse = ",")
[1] "Lorem ipsum,ipsum dolor,dolor sit,sit amet,amet consectetuer"

answered Aug 22 '14 at 07:54

talat

68,970
21
126
157

score 2 · Answer 3 · answered Aug 22 '14 at 08:00

2

Ahh got something similar.

text="Lorem,ipsum,dolor,sit,amet,consectetuer"
text2 <- unlist(strsplit(text, ","))
textNew=paste0(sapply(1:(length(text2)-1),function(i,y=text2){paste(y[i],y[i+1])}),collapse=",")

answered Aug 22 '14 at 08:00

sidpat

735
10
26

akrun · Answer 4 · 2014-08-22T09:59:27.147

You could also do:

  library(stringr)
   txt2 <- str_extract_all(text, "[^,]+")[[1]]
   paste(paste(txt2[-length(txt2)],txt2[-1],sep=" "), collapse=", ")
   #[1] "Lorem ipsum, ipsum dolor, dolor sit, sit amet, amet consectetuer"

Or

  library(gsubfn)
   paste(strapply(text, "([^,]+),(?=([^,]+))", paste, backref= -2, perl=TRUE)[[1]], collapse=",")
   #[1] "Lorem ipsum,ipsum dolor,dolor sit,sit amet,amet consectetuer"

score 2 · Answer 5 · answered Aug 23 '14 at 22:13

You can use this functions from stringi package

require(stringi)
text <- "Lorem,ipsum,dolor,sit,amet,consectetuer"
words <- stri_split_fixed(text,",")[[1]]
stri_join(words[-length(words)]," ",words[-1],collapse = ", ")
## [1] "Lorem ipsum, ipsum dolor, dolor sit, sit amet, amet consectetuer"

some benchmarks :)

stringi <- function(){
  words <- stri_split_fixed(text,",")[[1]]
  stri_join(words[-length(words)]," ",words[-1],collapse = ", ")
}

gsubAvinash <- function(){
  f <- gsub(",([^,]*)", " \\1,\\1", text, perl=TRUE)
  result <- gsub(",[^,]*$", "", f, perl=TRUE)
  result
}

strsplitBeggineR <- function(){
  x <- strsplit(text, ",")[[1]]
  paste0(sapply(1:(length(x)-1), function(z) paste(x[c(z, z+1)], collapse = " ")), collapse = ",")
}

stringrAkrun <- function(){
  txt2 <- str_extract_all(text, "[^,]+")[[1]]
  paste(paste(txt2[-length(txt2)],txt2[-1],sep=" "), collapse=", ")
}

require(microbenchmark)
microbenchmark(stringi(), gsubAvinash(),strsplitBeggineR(),stringrAkrun())
Unit: microseconds
               expr     min       lq   median       uq     max neval
          stringi()   8.657  10.6090  16.5005  17.6730  41.058   100
      gsubAvinash()  14.506  17.1055  20.2105  22.2040  97.399   100
 strsplitBeggineR()  53.609  59.7755  64.9470  68.3105 121.767   100
     stringrAkrun() 148.036 157.4715 162.4885 168.2880 342.471   100

String transformation in R | Grouping words of a string

5 Answers5

Linked