conditional concatenation in R

Question

I have a vector like this:

> myarray
[1] "AA\tThis is ",
[2] "\tthe ",
[3] "\tbegining."
[4] "BB\tA string of "
[5] "\tcharacters."
[6] "CC\tA short line."
[7] "DD\tThe "
[8] "\tend."`

I am trying to write a function that processes the above to generate this:

> myoutput
[1] "AA\tThis is the begining."
[2] "BB\tA string of characters."
[3] "CC\tA short line"
[4] "DD\tThe end."`

This is doable by looping through the rows and using an if statement to concatenate the current row with the last one if it starts with a \t. I was wondering if there is a more efficient way of achieving the same result.

score 3 · Accepted Answer · answered Apr 13 '18 at 18:06

# Create your example data 
myarray <- c("AA\this is ", "\tthe ", "\tbeginning", "BB\tA string of ", "\tcharacters.", "CC\tA short line.", "DD\tThe", "\tend")
# Find where each "sentence" starts based on detecting
# that the first character isn't \t
starts <- grepl("^[^\t]", myarray)
# Create a grouping variable
id <- cumsum(starts)
# Remove the leading \t as that seems like what your example output wants
tmp <- sub("^\t", "", myarray)
# split into groups and paste the groups together
sapply(split(tmp, id), paste, collapse = "")

And running it we get

> sapply(split(tmp, id), paste, collapse = "")
                           1                             2 
    "AA\this is the beginning" "BB\tA string of characters." 
                           3                             4 
          "CC\tA short line."                  "DD\tThe end"

MKR · Answer 2 · 2018-04-15T11:46:22.447

0

An option is to use paste than replace AA,BB etc. with additional character say ## and and strsplit as:

#Data
myarray <- c("AA\this is ", "\tthe ", "\tbeginning", "BB\tA string of ", 
"\tcharacters.", "CC\tA short line.", "DD\tThe", "\tend")


strsplit(gsub("([A-Z]{2})","##\\1",
                 paste(sub("^\t","", myarray), collapse = "")),"##")[[1]][-1]
# [1] "AA\this is the beginning"   
# [2] "BB\tA string of characters."
# [3] "CC\tA short line."          
# [4] "DD\tTheend"

edited Apr 15 '18 at 11:46

answered Apr 13 '18 at 20:17

MKR

19,739
4
23
33

This is not the desired output, there should be one \t per element. – C.Bruce Apr 14 '18 at 21:09
@C.Bruce Just a small tweak will fix the problem. We need to remove leading `\t` before collapsing together using `paste`. I have updated my answer to reflect it. – MKR Apr 15 '18 at 11:47

conditional concatenation in R

2 Answers2

Linked