2

I have written a RegExp to catch all the parameters and associated values of a SQL (JDBC) query.

I am using this.

(?:\S+\s)?\S*"myOperatorHere\S*(?:\s\S+)?

So that I can catch parameters like: Where c.value = 32

I can get c.value and 32

It works well with all the operators except IN

I'd like to catch where c.value IN (3,4,5,6)

But with this expression I get (3, as a value instead of (3,4,5,6)

For example if I have the query:

SELECT C.NAME, C.FIRSTNAME FROM CUSTOMER C, PROSPECT P WHERE C.ID = 32 AND C.TRUC = 28 AND P.ID < 12  AND P.A IN (2, 3, 4) 

I'd like to get C.ID = 32, C.TRUC = 28, P.ID < 12, P.A IN (2, 3, 4)

Could you please help me manage this? I can use two expressions if needed.

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
Gilles
  • 357
  • 3
  • 20

2 Answers2

1

I think you can just open up the range of valid characters in the second group. I've also written all of the options from your operator list.

(Demo Link)

(?:\S+\s)?\S*(?:IN|[<=>]+)\s(?:(?:\([^)]+\))|\S+)

Pattern breakdown:

(?:                 #non-capture group
    \S+\s           #1 or more non-white characters then a white character
)?                  #end non-capture group, zero or one occurrence of the group
\S*                 #zero or more non-white characters
(?:                 #non-capture group
    IN|[<=>]+       #literally match "IN" or one or more of any operator symbols in range 
)                   #end non-capture group
\s                  #whitespace character
(?:                 #non-capture group
    (?:             #non-capture group
        \([^)]+\)   #open parenthesis, anything not a close parathensis, close parenthesis
    )               #end non-capture group
    |               #or
    \S+             #one or more non-whitespace characters
)                   #close non-capture group

Edit: I was able to trim some steps and characters off of my pattern without damaging the output:

\S+ (?:IN|[<=>]+) (?:\([^)]+?\)|\S+)

This will match everything you have asked for.

New Demo

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
0

I believe, you actually have problem with input like this where c.value IN (3, 4, 5, 6) (notice the spaces after commas).

If so, I'd suggest to make use two patterns, one for scalar values and one for lists, and match either of these using alteration. The latter may be defined as sequence of non-spaces delimited by a comma and optionally some spaces, and the whole list surrounded by parenthesis, i.e. \(s*\S+\s*(?:,\s*\S+\s*)*\). And the entire regex:

(?:\S+\s)?\S*"myOperatorHere\S*(?:\s(?:\(s*\S+\s*(?:,\s*\S+\s*)*\)|\S+))?

Demo: https://regex101.com/r/YUtuty/1

Dmitry Egorov
  • 9,542
  • 3
  • 22
  • 40
  • Thanks. This one works when there are no space in the IN values. myoperator: I am using = or < or <= ... or IN. How can we add the possibility to have spaces in the IN? IN (2, 3, 4) – Gilles May 05 '17 at 07:07
  • @Gilles: oops, my bad, sorry. The alternation should have the list first and the `\S+` after that. Fixed the answer. – Dmitry Egorov May 05 '17 at 07:24
  • Thank you very much – Gilles May 05 '17 at 07:47