Replace String Pattern in sqldf

Question

I have a dataframe like below:

    Col1    Col2   Col3
ten: end       5     10
five: nb       7     11
    12:4      12     10
   13:56      15     16

Using the sqldf package in R, I want to do the following:

Replace values in Col1 with: character: space with -. The dash has spaces in the beginning and end.

Replace values in Col1 with: number:number with -. The dash has no spaces in the beginning and end.

Expected Output:

     Col1    Col2   Col3
ten - end       5     10
five - nb       7     11
     12-4      12     10
    13-56      15     16

Here is an example syntax using sqldf:

df <- sqldf("SELECT *, replace([Col1], [character: space], ' - ') [New Col generated] from df")

df <- sqldf("SELECT *, replace([Col1], [number:number], '-') [New Col generated_num] from df")

I tried referencing this doc and still no luck: https://www.rexegg.com/regex-quickstart.html

I don't think the `replace` function supports regular expressions, and `sqldf` does not seem to support `regexp_replace` — acylam, Apr 02 '19 at 18:51
Ok thank you. Is it possible to show the result using regular SQL then? — nak5120, Apr 02 '19 at 18:56
That's what I'm trying to figure out. You need at least some kind of regex to be able to tell if a string contains characters or digits and replace accordingly. Looks like `regexp` is being added as an extension to `RSQLite`: https://github.com/r-dbi/RSQLite/pull/266 — acylam, Apr 02 '19 at 18:59
h2 which is one of the backends supported by sqldf does support `regexp_replace` — G. Grothendieck, Apr 02 '19 at 21:26

G. Grothendieck · Accepted Answer · 2019-04-03T02:21:54.617

1) Assuming that only the forms shown in the question are allowed replace colons with minus signs and then replace minus followed by space with space, minus, space.

library(sqldf)
sqldf("select *, replace(replace([Col1], ':', '-'), '- ', ' - ') as New from df")

giving:

      Col1 Col2 Col3       New
1 ten: end    5   10 ten - end
2 five: nb    7   11 five - nb
3     12:4   12   10      12-4
4    13:56   15   16     13-56

2) If we can assume that the only forms are number:number or character: character and that the second form contains no digits.

sqldf("select *, 
  case when strFilter(Col1, '0123456789') = '' 
         then replace(Col1, ':', ' -')
       else replace(Col1, ':', '-')
       end as New
  from df")

giving:

      Col1 Col2 Col3       New
1 ten: end    5   10 ten - end
2 five: nb    7   11 five - nb
3     12:4   12   10      12-4
4    13:56   15   16     13-56

3) This first checks for numbers:numbers and then checks for characters: characters where characters can only be numbers or lower case letters.

dig <- "0123456789"
diglet <- "0123456789abcdefghijklmnopqrstuvwxyz"

fn$sqldf("select *,
  case when trim(Col1, '$dig') = ':'
         then replace(Col1, ':', '-')
  when trim(Col1, '$diglet') = ': '
          then replace(Col1, ': ', ' - ')
  else Col1 end as New
  from df")

giving:

      Col1 Col2 Col3       New
1 ten: end    5   10 ten - end
2 five: nb    7   11 five - nb
3     12:4   12   10      12-4
4    13:56   15   16     13-56

4) This one extracts the x:y and checks whether x and y are number and if so does the appropriate replacement and if no match it extracts x:yz where y is a space and if x and z are digits or lower case then it performs the appropriate replacement and otherwise returns Col1. dig and diglet are from above.

fn$sqldf("select *, 
  case when trim(substr(Col1, instr(Col1, ':')-1, 3), '$dig') = ':'
         then replace(Col1, ':', '-')
       when trim(substr(Col1, instr(Col1, ':') -1, 4), '$diglet') = ': '
         then replace(Col1, ': ', ' - ')
       else Col1 end as New
  from df")

Note

The input in reproducible form is:

Lines <- "Col1,Col2,Col3
ten: end,5,10
five: nb,7,11
12:4,12,10
13:56,15,16"
df <- read.csv(text = Lines, as.is = TRUE, strip.white = TRUE)

Added (2), (3) and (4). – G. Grothendieck Apr 03 '19 at 02:22 — G. Grothendieck, Apr 03 '19 at 02:22

Replace String Pattern in sqldf

1 Answers1

Note