I'm trying to use regexMatcher from String Manipulation in KNIME but it doesn't work. I'm writing the following: regexMatcher($Document$,"/\w") when I want to extract all sentences that have /s or /p or w/p or /200. However even though I have such cases in my table nothing is retrieved. I will appreciate your help.
-
Try `regexMatcher($Document$,"/\\w") = "TRUE"` – Wiktor Stribiżew Oct 04 '16 at 16:53
-
I have updated my question. I have tried to use your suggestion but it didn't work – Regina Oct 04 '16 at 17:00
-
Try `"^\\w*/\\w+$"` – Wiktor Stribiżew Oct 04 '16 at 17:10
-
It doesn't work too. Looks like this functionality not working at all. Can it be the case? – Regina Oct 04 '16 at 18:09
-
If you can find any documentation or source code, we could check. BTW, from what I know, the regex flavor is the same as Java. – Wiktor Stribiżew Oct 04 '16 at 18:10
-
The point is that it works but I get as a result that everything is false and nothing matches the regular expression. Maybe you are familiar with other way of doing the same. I just have to filter these type of rows. – Regina Oct 04 '16 at 18:12
1 Answers
I got the following:
|Document |isOK |other|strict|
|--------------|-----|-----|------|
|Some /p with q|True |False|False |
|/200 |True |True |False |
|/p |True |True |True |
|/s |True |True |True |
|w/p |True |False|False |
|no slash |False|False|False |
For the expressions:
- isOK:
regexMatcher($Document$, ".*?/\\w.*")
(I guess this is what you are after.) - other:
regexMatcher($Document$, "/\\w.*")
- strict:
regexMatcher($Document$, "/\\w")
(Document contains no content after the last visible character.)
The problem you might run into is the escaping for the string manipulator node and the semantics of regexMatcher
.
The String literal within there is just a Java String, so you have to escape the \
(and some other characters), so it becomes \\
.
The semantics of regexMatcher
is to match the whole String, so you have to add .*?
(non-greedy match anything) before the value you are looking for and .*
(greedy match anything) after the expression you are looking for.
(Obviously if I misunderstood your question, the semantics is probably already is what you want.)
BTW: in case you want to filter, you should check the Rule-based Row Filter node as it offers an option to directly filter by regex. It uses a different escaping rule (for the isOK option):
$Document$ MATCHES ".*?/\w.*" => TRUE
(escaping is not allowed within quotes)$Document$ MATCHES /.*?\/\\w.*/ => TRUE
(escaping is allowed within slashes (and/
,\
are need to be escaped, but"
is not required))

- 8,982
- 52
- 35
- 52