0

I'm trying to use regexMatcher from String Manipulation in KNIME but it doesn't work. I'm writing the following: regexMatcher($Document$,"/\w") when I want to extract all sentences that have /s or /p or w/p or /200. However even though I have such cases in my table nothing is retrieved. I will appreciate your help.

Regina
  • 115
  • 4
  • 13

1 Answers1

1

I got the following:

|Document      |isOK |other|strict|
|--------------|-----|-----|------|
|Some /p with q|True |False|False |
|/200          |True |True |False |
|/p            |True |True |True  |
|/s            |True |True |True  |
|w/p           |True |False|False |
|no slash      |False|False|False |

For the expressions:

  • isOK: regexMatcher($Document$, ".*?/\\w.*") (I guess this is what you are after.)
  • other: regexMatcher($Document$, "/\\w.*")
  • strict: regexMatcher($Document$, "/\\w")

(Document contains no content after the last visible character.)

The problem you might run into is the escaping for the string manipulator node and the semantics of regexMatcher.

The String literal within there is just a Java String, so you have to escape the \ (and some other characters), so it becomes \\.

The semantics of regexMatcher is to match the whole String, so you have to add .*? (non-greedy match anything) before the value you are looking for and .* (greedy match anything) after the expression you are looking for. (Obviously if I misunderstood your question, the semantics is probably already is what you want.)

BTW: in case you want to filter, you should check the Rule-based Row Filter node as it offers an option to directly filter by regex. It uses a different escaping rule (for the isOK option):

  • $Document$ MATCHES ".*?/\w.*" => TRUE (escaping is not allowed within quotes)
  • $Document$ MATCHES /.*?\/\\w.*/ => TRUE (escaping is allowed within slashes (and /, \ are need to be escaped, but " is not required))

Example workflow

Gábor Bakos
  • 8,982
  • 52
  • 35
  • 52