Matching special character in R

Question

Hi I have the following data.

shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2", 
                   "appple+20gfree", 
                   "BELI HG MSWAT ALA +VAT T 100g BAR WR", 
                   "TOOLAIT CASSE+LSST+SSSRE 40g SAC MDC")

In my second step I remove all whitespace in shopping_list.

require(stringr)
shopping_list_trim <- str_replace_all(shopping_list, fixed(" "), "")
print(shopping_list_trim)
[1] "applesx4" "bagofflour" "bagofsugar"             
[4] "milkx2" "appple+20gfree" "BELIHGMSWATALA+VATT100gBARWR"
[7] "TOOLAITCASSE+LSST+SSSRE40gSACMDC"

If I want to extract the string that does not contain plus sign I use the following code.

str_extract(shopping_list_trim, "^[^+]+$")
[1] "applesx4"   "bagofflour" "bagofsugar" "milkx2"  NA  NA NA

Would like to help to extract the string that contain plus sign. I would like the output to be the following one.

NA NA NA NA   "appple+20gfree" 
"BELIHGMSWATALA+VATT100gBARWR" "TOOLAITCASSE+LSST+SSSRE40gSACMDC"

Does anybody have idea how to extract only string that contains plus sign?

`grepl("(?=.*\\+)", shopping_list_trim, perl=T)` – rock321987 Apr 28 '16 at 10:16 — rock321987, Apr 28 '16 at 10:16

rock321987 · Accepted Answer · 2016-04-28T10:23:54.717

1

This will do the trick

> str_extract(shopping_list_trim, "^(?=.*\\+)(.+)$")
[1] NA                                
[2] NA                                
[3] NA                                
[4] NA                                
[5] "appple+20gfree"                  
[6] "BELIHGMSWATALA+VATT100gBARWR"    
[7] "TOOLAITCASSE+LSST+SSSRE40gSACMDC"

Regex Breakdown

^(?=.*\\+) #Lookahead to check if there is one plus sign
(.+)$ #Capture the string if the above is true

edited Apr 28 '16 at 10:23

answered Apr 28 '16 at 10:17

rock321987

10,942
1
30
43

About this part of regex `^(?=.*\\+)`. What is the role of ?=. character? – ssssss ssssss Apr 28 '16 at 10:42
@ssssssssssss Its a lookahead..They are of zero width means they don't consume any string..So basically we are just checking if there is a `+` sign present in string without even moving forward in string..`(?=)` is syntax of lookahead.. **[read](http://www.regular-expressions.info/lookaround.html)** – rock321987 Apr 28 '16 at 10:44
the answer given by @ClasG is much simpler if you find it complicated to understand – rock321987 Apr 28 '16 at 10:46

score 0 · Answer 2 · answered Apr 28 '16 at 10:21

0

If you can't/don't want to use look-arounds, try

^.*\+.*$

It matches anything followed by a + followed by anything :)

See it work here at regex101.

Regards

answered Apr 28 '16 at 10:21

SamWhan

8,296
1
18
45

Matching special character in R

2 Answers2