R replace character in string if occurs after location or more than once

Question

I am trying to import interview transcriptions with textreadr, but it works by separating the text into two columns through locating a separator character (usually a colon). In transcriptions I have a colon occasionally appears in the body of the response text which causes an error. I was hoping to replace these colons with something else (e.g. a dash or underscore), but not sure how to go about down that.

I can find the location of all the colons through gregexpr(), but then how can I replace them? Would I be able to use grep or sub somehow through an if statement?

EDIT

Ok found a inelegent solution through the stringr package:

First I replace all the colons through

dat = str_replace_all(text,":","_")

Then I reinsert only the first colon that I wanted to keep through

dat = str_replace(dat,"_",":")

Not great, but it worked....

stringi::stri_replace_first_regex – Carl Boneri Feb 07 '17 at 23:42 — Carl Boneri, Feb 07 '17 at 23:42

score 0 · Answer 1 · answered Feb 08 '17 at 15:35

You can use strsplit and then combine all elements after the first. Something like:

txn <- c("Int1: This is some text.",
         "Int2: As I speak I take a long pause: for effect",
         "Int1: This inteview is over.")               

transcripts <- strsplit(txn, ":")
interviewer <- sapply(transcripts, "[", 1)
scripts <- sapply(transcripts, function(x) paste(x[-1], collapse = ":"))
dat <- data.frame(interviewer, scripts)

R replace character in string if occurs after location or more than once

1 Answers1