0

Here is the case. I want to run SparkNLP on Jupyterlab with Scala kernel. I want to use the RegexMatcher annotation. I saved the pattern in a file named patterns.txt on s3 bucket. And I tried the implementation below

import com.johnsnowlabs.nlp.util.io.ExternalResource
import com.johnsnowlabs.nlp.util.io.ReadAs.LINE_BY_LINE
val document = new DocumentAssembler().setInputCol("text").setOutputCol("document")
val regexmatcher = new RegexMatcher().
  setInputCols(Array("document")).
  setOutputCol("match").
  setStrategy("MATCH_ALL").
  setRules(ExternalResource("s3://bucket_name/patterns.txt", LINE_BY_LINE, Map("format" -> "text", "delimiter" -> " ")))
val pipeline_regex = new Pipeline().setStages(Array(document, regexmatcher))
val regex_match = pipeline_regex.fit(dev_data)
regex_match.transform(dev_data).select('match).show(false)

However, it seems thit doesn't work at all, and patterns.txt are not used. How to fix it.

sophros
  • 14,672
  • 11
  • 46
  • 75
Bs He
  • 717
  • 1
  • 10
  • 22
  • Where is your variable `dev_data` declared? I cannot see it in your code. What type is it ? – Catalina Chircu Mar 20 '20 at 09:01
  • @CatalinaChircu the `dev_data` is declared earlier. It is a dataframe. – Bs He Mar 20 '20 at 16:34
  • Is there any specific error you can add? The more details you provide the better response you can get. Not working at all doesn't really mean anything. I need to see how the DataFrame looks like, what is inside patterns.txt, etc. – Maziyar Mar 20 '20 at 19:22
  • @Maziyar Thanks. No error, just returns empty matches. Below is an example of my regex, key pair in the pattern.txt: `^(?i)application\\s+for motion-prefix` – Bs He Mar 20 '20 at 23:17
  • Did you mean your regex to have Negative Look ahead? ```^(?!)application\\s+for motion-prefix``` – Prasanna Saraswathi Krishnan Aug 20 '20 at 08:02

0 Answers0