1

Hi I have the following data.

shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2", 
                   "appple+20gfree", 
                   "BELI HG MSWAT ALA +VAT T 100g BAR WR", 
                   "TOOLAIT CASSE+LSST+SSSRE 40g SAC MDC")

In my second step I remove all whitespace in shopping_list.

require(stringr)
shopping_list_trim <- str_replace_all(shopping_list, fixed(" "), "")
print(shopping_list_trim)
[1] "applesx4" "bagofflour" "bagofsugar"             
[4] "milkx2" "appple+20gfree" "BELIHGMSWATALA+VATT100gBARWR"
[7] "TOOLAITCASSE+LSST+SSSRE40gSACMDC"

If I want to extract the string that does not contain plus sign I use the following code.

str_extract(shopping_list_trim, "^[^+]+$")
[1] "applesx4"   "bagofflour" "bagofsugar" "milkx2"  NA  NA NA     

Would like to help to extract the string that contain plus sign. I would like the output to be the following one.

NA NA NA NA   "appple+20gfree" 
"BELIHGMSWATALA+VATT100gBARWR" "TOOLAITCASSE+LSST+SSSRE40gSACMDC"

Does anybody have idea how to extract only string that contains plus sign?

2 Answers2

1

This will do the trick

> str_extract(shopping_list_trim, "^(?=.*\\+)(.+)$")
[1] NA                                
[2] NA                                
[3] NA                                
[4] NA                                
[5] "appple+20gfree"                  
[6] "BELIHGMSWATALA+VATT100gBARWR"    
[7] "TOOLAITCASSE+LSST+SSSRE40gSACMDC"

Regex Breakdown

^(?=.*\\+) #Lookahead to check if there is one plus sign
(.+)$ #Capture the string if the above is true
rock321987
  • 10,942
  • 1
  • 30
  • 43
  • About this part of regex `^(?=.*\\+)`. What is the role of ?=. character? – ssssss ssssss Apr 28 '16 at 10:42
  • @ssssssssssss Its a lookahead..They are of zero width means they don't consume any string..So basically we are just checking if there is a `+` sign present in string without even moving forward in string..`(?=)` is syntax of lookahead.. **[read](http://www.regular-expressions.info/lookaround.html)** – rock321987 Apr 28 '16 at 10:44
  • the answer given by @ClasG is much simpler if you find it complicated to understand – rock321987 Apr 28 '16 at 10:46
0

If you can't/don't want to use look-arounds, try

^.*\+.*$

It matches anything followed by a + followed by anything :)

See it work here at regex101.

Regards

SamWhan
  • 8,296
  • 1
  • 18
  • 45