2

I'm trying to use entities to get a match on some data, and the regex seems like it doesn't really match well with other similar engines in Python or even sites like regexr.com. Here some examples:

Pattern: ([\w]{8}-[\w]{4}-[\w]{4}-[\w]{4}-[\w]{12}-[\w]{3})

String style to match: 83123e42-d9ad-a26a-b13f-b0ec91c7fedf-ABC

However, when testing this out, it gets:

@id:83123e42

@id:d9ad

@id:a26a

@id:b13f

@id:b0ec91c7fedf

@id:ABC

I've tried grouping the whole string, using string delimiters, escaping the hypens, using .{4}- instead of \w, but all to no solid result, and often getting the exact same matching where it splits it into groups rather than one full match.

Is this a regex issue? I tried not grouping the whole string, but seem to keep running into the exact same issue, where it won't even find the last 3 letters anyway.

If Watson Assistant uses a different regex engine, is there a place with documentation that I just haven't been able to find? They seem to just assume that any normal regex will work, but skipping the hypens is strange behavior.

data_henrik
  • 16,724
  • 2
  • 28
  • 49
  • Your input is split with `-`. You should check the relevant documentation on how the string and the regex extraction results are used. – Wiktor Stribiżew Oct 11 '18 at 16:01
  • @WiktorStribiżew Should I escape the `-`s then? So regex should be: ([\w]{8}\-[\w]{4}\-[\w]{4}\-[\w]{4}\-[\w]{12}\-[\w]{3}) That doesn't seem to capture the full group either – Dylan Shepard Oct 11 '18 at 16:15
  • You should not escape a `-` in any regex engine when outside of a charsacter class. Only in Lua patterns it must escaped. – Wiktor Stribiżew Oct 11 '18 at 16:17
  • Okay great, so then how would I capture the hypens properly to be able to have the entity return as @id:83123e42-d9ad-a26a-b13f-b0ec91c7fedf-ABC ? – Dylan Shepard Oct 11 '18 at 16:22
  • Your pattern works ok for me in `WA` `@id.literal` returns the full matched id and `@id.groups` returns the matched groups - if any defined. – Michal Bida Oct 22 '18 at 09:34
  • @MichalBida Sorry, I don't think I understand. How would I get the `@id.literal` value? The `WA` json response is only grouped, there is no literal attribute, unless I'm missing something? – Dylan Shepard Oct 23 '18 at 14:10
  • If you want to get the value in the JSON response you can create a variable `"my_id_literal" : "@id.literal?>"` <- just check the `@id` entity is recognized in the input. – Michal Bida Oct 23 '18 at 14:33
  • @MichalBida So, that just comes back with `@id_literal = "a69986e1"` Where the whole "id" sent in is: `a69986e1-3660-a52d-c967-444fd239dd02-ABC` – Dylan Shepard Oct 24 '18 at 17:03
  • That is weird, because it comes correctly for me. What API version are you using? The latest one? – Michal Bida Oct 25 '18 at 11:24

2 Answers2

1

Ended up finding a more direct answer from an awesome helper in the Slack channel:

Turns out that something in the Watson assistant Regex doesn't recognize hyphens.

He ended up working with me and showing me a bit of SpEL that I have running to assign to a context variable that I can then use.

"<? input.text.extract('(\\w{8}\\-\\w{4}\\-\\w{4}\\-\\w{4}\\-\\w{12}\\-\\w{3}[^\\w]+)', 0) ?>"

0

Citing the Watson Assistant docs for defining entities, here the relevant parts:

The regular expression engine is loosely based on the Java regular expression engine. The Watson Assistant service will produce an error if you try to upload an unsupported pattern, either via the API or from within the Watson Assistant service Tooling UI.

That section has some information on limitations and what to consider when writing regex expressions. The most significant cited are:

Entity patterns may not contain:
- Positive repetitions (for example x*+)
- Backreferences (for example \g1)
- Conditional branches (for example (?(cond)true))

data_henrik
  • 16,724
  • 2
  • 28
  • 49
  • So, I didn't use positive repetitions, backrefs, or conditional branches. And from what I've read about the Java regex engine, that pattern should have been fine. I see nothing about anything I'm doing in that pattern that would fail to recognize the full string. – Dylan Shepard Oct 11 '18 at 16:14
  • Lookaheads and lookbehinds are not supported either in `WA` regexps. – Michal Bida Oct 22 '18 at 09:23