How to use the RegexMatcher in SparkNLP

Question

Here is the case. I want to run SparkNLP on Jupyterlab with Scala kernel. I want to use the RegexMatcher annotation. I saved the pattern in a file named patterns.txt on s3 bucket. And I tried the implementation below

import com.johnsnowlabs.nlp.util.io.ExternalResource
import com.johnsnowlabs.nlp.util.io.ReadAs.LINE_BY_LINE
val document = new DocumentAssembler().setInputCol("text").setOutputCol("document")
val regexmatcher = new RegexMatcher().
  setInputCols(Array("document")).
  setOutputCol("match").
  setStrategy("MATCH_ALL").
  setRules(ExternalResource("s3://bucket_name/patterns.txt", LINE_BY_LINE, Map("format" -> "text", "delimiter" -> " ")))
val pipeline_regex = new Pipeline().setStages(Array(document, regexmatcher))
val regex_match = pipeline_regex.fit(dev_data)
regex_match.transform(dev_data).select('match).show(false)

However, it seems thit doesn't work at all, and patterns.txt are not used. How to fix it.

Where is your variable `dev_data` declared? I cannot see it in your code. What type is it ? — Catalina Chircu, Mar 20 '20 at 09:01
@CatalinaChircu the `dev_data` is declared earlier. It is a dataframe. — Bs He, Mar 20 '20 at 16:34
Is there any specific error you can add? The more details you provide the better response you can get. Not working at all doesn't really mean anything. I need to see how the DataFrame looks like, what is inside patterns.txt, etc. — Maziyar, Mar 20 '20 at 19:22
@Maziyar Thanks. No error, just returns empty matches. Below is an example of my regex, key pair in the pattern.txt: `^(?i)application\\s+for motion-prefix` — Bs He, Mar 20 '20 at 23:17
Did you mean your regex to have Negative Look ahead? ```^(?!)application\\s+for motion-prefix``` — Prasanna Saraswathi Krishnan, Aug 20 '20 at 08:02

How to use the RegexMatcher in SparkNLP

0 Answers0