Find and replace text between two strings in R

Question

I have created some tutorials on R in some Rscripts. I need a Handout Set(HS) and a Coding Set (CS) without answers in which students can code . I need some help regex to search for the answer section in HO so I can remove it from the CS.

In the HS I have beginning (#'YOUR_ANSWER)and end (#'END_ANSWER) flags before/after the answers. To create the HO set I need to replace

YOUR_ANSWER
As_samp2 = 36
As_samp3 = 38      
#'END_ANSWER

with

"space for answer".

So if my text is in a:

a = "#'YOUR_ANSWER
       As_samp2 = 36
       As_samp3 = 38

       #'END_ANSWER"

I have tried regex but there is no replacement

b <-gsub(pattern = "YOUR_ANSWER(.*\n*)*#'END_ANSWER", a, replace="space for answer" )

if I dont use regex ie just find "YOUR_ANSWER" - the replacement works ie

c <-gsub(pattern = "YOUR_ANSWER", a, replace="space for answer" )

if I just do regex, as expected all text is substituted ie

d <- gsub(pattern = "(.*\n*)*", a, replace="space for answer" )

but the combination doesnt work. The regex should work see:

https://regex101.com/r/USvzLF/1

So there must be some deep R magic that I'm not getting

    b <- gsub(pattern = "YOUR_ANSWER(.*\n*)*END_ANSWER", a, replace="space for answer" )
    c <- gsub(pattern = "YOUR_ANSWER", a, replace="space for answer" )
    d <- gsub(pattern = "(.*\n*)*", a, replace="space for answer" )

I expect to have replaced all between YOUR_ANSWER and END_ANSWER with space for answer But nothing happens. Any ideas? UPDATE now @r2evans has shown me working regex; The R script I am trying to change is https://pastebin.com/mnjpkUFk (ie myfile) And the code I am using to try and change it (in a separate R script) is: FileM <- readLines(myfile) FileMedit <- gsub(pattern = "YOUR_ANSWER", FileM, replace="space for answer" ) FileMedit <- gsub(pattern = "YOUR_ANSWER.*END_ANSWER", FileM, replace="space for answer" ) writeLines(FileMedit,file = "outputfileM.R")

I am having a really tough time understanding what your current text looks like and what you wish to transform it into. Please update your question with a simplified example of "before" and "after" transformation. — MonkeyZeus, Sep 03 '19 at 19:23
Not sure, but doesn't `gsub("#'YOUR_ANSWER.*END_ANSWER", "(space for answer)", a)` work well enough? This is effectively your `b` ... which also works for me. — r2evans, Sep 03 '19 at 19:59
@MonkeyZeus thanks for the help, but there's a problem with escape characters in R; error message is:Error: '\s' is an unrecognized escape in character string starting ""YOUR_ANSWER\s". So things that work in Regex 101 are not working in R — WickHerd, Sep 03 '19 at 20:51
Do you want to keep `#'YOUR_ANSWER` and `#'END_ANSWER` and replace the content between them? — The fourth bird, Sep 03 '19 at 20:59
@r2evans - well you have moved me on a bit as I agree the code does work in the string I presented. When I translated it into changing a file, no joy. I am going to add something to my question to clarify — WickHerd, Sep 03 '19 at 21:57
Maybe if you included a little more of your document, we might be able to provide suggestions on the concept as a whole instead of fixing this one thing. (For instance, can you just set `include=params$incl` for each chunk, where `params$incl` (parameterized r-markdown) determines the mode of the printed document? — r2evans, Sep 03 '19 at 22:11
I hope you solved the issue in time, if not, I posted the solution tested in R. — Wiktor Stribiżew, Jul 15 '20 at 11:35

score 1 · Answer 1 · answered Jul 15 '20 at 11:34

The problem is that you read your file in as a list of character vectors and apply a regex that expects a single multiline text as input.

> FileM
 [1] "#'Rstudio environment"                                                             "#'==="                                                                            
 [3] " "                                                                                 "#'Top Left - scripts"                                                             
 [5] "#+"                                                                                "myfirstvariable = \"Hello R\"  #press control enter with cursor on line  "        
 [7] "myfirstvariable"                                                                   "As_samp1 = 34"                                                                    
 [9] " "                                                                                 "#'practical: create variables for arsenic concentration in 2 more samples"        
[11] "#+"                                                                                "#'YOUR_ANSWER"                                                                    
[13] "As_samp2 = 36"                                                                     "As_samp3 = 38"                                                                    
[15] " "                                                                                 "#'END_ANSWER"                                                                     
[17] "#+"                                                                                "#'Bottom Left - console"                                                          
[19] "#+"                                                                                "2+2"                                                                              
[21] " "                                                                                 "#'practical: calculate average As concentration, store result in variable As_mean"
[23] "#+"                                                                                "#'YOUR_ANSWER"                                                                    
[25] "As_mean<- (As_samp1 + As_samp2 + As_samp3)/3"                                      "#'END_ANSWER"                                                                     
[27] "#+"                                                                                "#'A word on comments"                                                             
[29] "#This is a comment"                                                                "#ignore #' and #+ <br/><br/>"

Hence, you should join the lines before running the regex:

FileM <- paste(FileM, collapse="\n")

Then, use

FileMedit <- gsub("YOUR_ANSWER.*?END_ANSWER", "space for answer", FileM)

Now, cat(FileMedit, collapse="\n") shows

#'Rstudio environment
#'===
 
#'Top Left - scripts
#+
myfirstvariable = "Hello R"  #press control enter with cursor on line  
myfirstvariable
As_samp1 = 34
 
#'practical: create variables for arsenic concentration in 2 more samples
#+
#'space for answer
#+
#'Bottom Left - console
#+
2+2
 
#'practical: calculate average As concentration, store result in variable As_mean
#+
#'space for answer
#+
#'A word on comments
#This is a comment
#ignore #' and #+ <br/><br/>

Now, save it:

cat(FileMedit, file = "outputfileM.R")

The fourth bird · Answer 2 · 2019-09-03T21:16:59.657

To get a more specific match, you could match the first line. Then match all following lines the don't start with optional leading horizontal whitespace chars and #'END_ANSWER as the only text on the line.

Then match the last line and replace the match with space for answer

#'YOUR_ANSWER.*(?:\R(?!\h*#'END_ANSWER$).*)*\R\h*#'END_ANSWER$

Regex demo | R demo

For example

b <-gsub(pattern = "^#'YOUR_ANSWER.*(?:\\R(?!\\h*#'END_ANSWER$).*)*\\R\\h*#'END_ANSWER$", a, replace="space for answer", per=T)

If you want replace what is between YOUR_ANSWER and END_ANSWER, you could use 2 caputuring groups and use those in the replacement.

^(#'YOUR_ANSWER.*)(?:\R(?!\h*#'END_ANSWER$).*)*(\R\h*#'END_ANSWER)$

Regex demo | R demo

Find and replace text between two strings in R

2 Answers2

Linked

Related