Return value between two characters in a string

Question

I'm trying to extract values in a form from a word document so that I can tabulate them. I used the antiword package to convert the .doc into a character string, now I'd like to pull out values based on markers within the document.

For example

example<- 'CONTACT INFORMATION\r\n\r\nName:  John Smith\r\n\r\nphone:  XXX-XXX-XXXX\r\n\r\n' 
Name<- grep('\nName:', example, value = TRUE)
Name

This code returns the whole string when I'd like it to just return 'John Smith'.

Is there a way to add an end marker to the grep()? I've also tried str_extract() but I'm having trouble formatting my pattern to regex

score 3 · Accepted Answer · answered Mar 27 '19 at 16:59

3

We can use gsub to remove the substring that include Name: and after those characters that start after the \r by matching the pattern and replace with blank ("")

gsub(".*Name:\\s+|\r.*", "", example)
#[1] "John Smith"

answered Mar 27 '19 at 16:59

akrun

874,273
37
540
662

This works perfectly for my example code, but not when I try to implement it to the actual word .doc I have. My understanding of regex is very low, but I found that when I modified the code to gsub(".*\\sName:\\s+|\r.*", "", example), it worked again – nbouc Mar 27 '19 at 17:50

score 1 · Answer 2 · answered Mar 27 '19 at 17:27

1

We can also use:

strsplit(stringr::str_extract_all(example,"\\\nName:.*",simplify = T),":  ")[[1]][2]
#[1] "John Smith"

answered Mar 27 '19 at 17:27

NelsonGon

13,015
7
27
57

Return value between two characters in a string

2 Answers2