2

I have a set of strings and I need to search by words that have a period in the middle. Some of the strings are concatenated so I need to break them apart in to words so that I can then filter for words with dots.

Below is a sample of what I have and what I get so far

 punctToRemove <- c("[^[:alnum:][:space:]._]")

 s <- c("get_degree('TITLE',PERS.ID)",
        "CLIENT_NEED.TYPE_CODe=21",
        "2.1.1Report Field Level Definition",
        "The user defined field. The user will validate")

This is what I currently get

gsub(punctToRemove, " ", s)

[1] "get_degree  TITLE  PERS.ID "                   
[2] "CLIENT_NEED.TYPE_CODe 21"                      
[3] "2.1.1Report Field Level Definition"            
[4] "The user defined field. The user will validate"

Sample of what I want is below

[1] "get_degree ( ' TITLE ' , PERS.ID ) "          # spaces before and after the "(", "'", ",",and ")"
[2] "CLIENT_NEED.TYPE_CODe = 21"                   # spaces before and after the "=" sign. Dot and underscore remain untouched.        
[3] "2.1.1Report Field Level Definition"           # no changes 
[4] "The user defined field. The user will validate" # no changes
lmo
  • 37,904
  • 9
  • 56
  • 69
user3357059
  • 1,122
  • 1
  • 15
  • 30

2 Answers2

2

We can use regex lookarounds

s1 <- gsub("(?<=['=(),])|(?=['(),=])", " ", s, perl = TRUE)
s1
#[1] "get_degree ( ' TITLE ' , PERS.ID ) "           
#[2] "CLIENT_NEED.TYPE_CODe = 21"                    
#[3] "2.1.1Report Field Level Definition"            
#[4] "The user defined field. The user will validate"

nchar(s1)
#[1] 35 26 34 46

which is equal to the number of characters showed in the OP's expected output.

akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    I updated the code to accommodate for when there are vertical bars... such as **||Client** below is how it is now `gsub("(?<=[\\|'=(),])|(?=[\\|'(),=])", " ", s, perl = TRUE)` – user3357059 Sep 14 '16 at 19:29
0

For this example:

   library(stringr)
    s <- str_replace_all(s, "\\)", " \\) ")
    s <- str_replace_all(s, "\\(", " \\( ")
    s <- str_replace_all(s, "=", " = ")
    s <- str_replace_all(s, "'", " ' ")
    s <- str_replace_all(s, ",", " , ")
Oli
  • 532
  • 1
  • 5
  • 26