Questions tagged [johnsnowlabs-spark-nlp]

John Snow Labs’ NLP is a natural language processing tool built on top of Apache Spark ML pipelines

External links

Related tags:

100 questions
0
votes
1 answer

Analysis Exception in spark NLP

strong textBelow was my code block: conll_data.select(F.explode(F.arrays_zip('token.result','label.result')).alias("cols")) \ .select(F.expr("cols['0']").alias("token"), F.expr("cols['1']").alias("ground_truth"))\ …
0
votes
1 answer

SparkNLP PipelineModel which includes AnnotatorApproach in stages

In a SparkNLP's PipelineModel all the stages have to be of type AnnotatorModel. But what if one of those annotatormodels requires a certain column in the dataset as input and this input column is the output of an AnnotatorApproach? For instance, I…
martin_wun
  • 1,599
  • 1
  • 15
  • 33
0
votes
1 answer

sparkNLP Tokenization of Contractions

I'm using sparkNLP version 3.2.3 and trying to tokenize some text. I've used spacy and other tokenizers that handle contractions such as "they're" by splitting it into "they" and "'re". According to this resource pages 105-107 sparkNLP should…
user3242036
  • 645
  • 1
  • 7
  • 16
0
votes
2 answers

SparkNLP's NerCrfApproach with custom labels

I am trying to train a SparkNLP NerCrfApproach model with a dataset in CoNLL format that has custom labels for product entities (like I-Prod, B-Prod etc.). However, when using the trained model to make predictions, I get only "O" as the assigned…
martin_wun
  • 1,599
  • 1
  • 15
  • 33
0
votes
1 answer

Mix Smark MLLIB and SparkNLP in pipeline

In a MLLIB pipeline, how can I chain a CountVectorizer (from SparkML) after a Stemmer (from Spark NLP) ? When I try to use both in a pipeline I get: myColName must be of type equal to one of the following types: [array, array] but…
Benjamin
  • 3,350
  • 4
  • 24
  • 49
0
votes
1 answer

Loading large sparknlp pipeline into Apache Spark batch job taking too long

I am using SparkNLP from johnsnowlabs for extracting embeddings from my textual data, below is the pipeline. The size of the model is 1.8g after saving to hdfs embeddings = BertSentenceEmbeddings.pretrained("labse", "xx") \ …
0
votes
1 answer

Glue job failed with `JohnSnowLabs spark-nlp dependency not found` error randomly

I'm using AWS Glue to run some pyspark python code, sometimes it succeeded but sometimes failed with a dependency error: Resource Setup Error: Exception in thread "main" java.lang.RuntimeException: [unresolved dependency:…
0
votes
1 answer

py4j.protocol.Py4JNetworkError: Answer from Java side is empty

This is the code I am using on Google Colab. It keeps getting stuck at the model.fit part and throws this exception. I haven't been able to find any solutions for it anywhere. The memory also seems to get very high on Colab, starting to think…
0
votes
1 answer

How to use `LanguageDetectorDL` spark NLP on pyspark column?

I am working with pyspark dataframe. I have df that looks like this: df.select('words').show(5, truncate = 130) +----------------------------------------------------------------------------------------------------------------------------------+ | …
0
votes
1 answer

Spark-NLP functions give pickling error when using map

I have an RDD of the following structure: my_rdd = [Row(text='Hello World. This is bad.'), Row(text='This is good.'), ...] I can perform parallel processing with python functions: rdd2=my_rdd.map(lambda f: f.text.split()) for x in rdd2.collect(): …
0
votes
0 answers

How to use multiple clean up patterns in Normalizer (spark nlp)?

I am working with pyspark dataframe. I need to perform tf-idf and for that I am used prior steps of tokenizing, normalization, etc using spark NLP. I have df that looks like this after applying tokenizer: df.select('tokenizer').show(5, truncate =…
0
votes
1 answer

Error when load Spark-nlp pretrainedPipeline

One exception occurred when I load a spark nlp pretrainedPipeline as following: Exception in thread "main" java.lang.IllegalArgumentException: Unsupported class file major version 59 I am new to Scala, can anyone recognize the reason? Thank you in…
0
votes
0 answers

pyspark.sql.utils.IllegalArgumentException: 'requirement failed: Was not found appropriate resource to download for request

I'm trying to run the example code below: import sparknlp sparknlp.start() from sparknlp.pretrained import PretrainedPipeline explain_document_pipeline = PretrainedPipeline("explain_document_ml") annotations =…
wawawa
  • 2,835
  • 6
  • 44
  • 105
0
votes
1 answer

java.lang.ClassNotFoundException: com.johnsnowlabs.nlp.DocumentAssembler spark in Pycharm with conda env

I saved a pre-trained model from spark-nlp, then I'm trying to run a Python script in Pycharm with anaconda env: Model_path = "./xxx" model = PipelineModel.load(Model_path) But I got the following error: (I tried with pyspark 2.4.4 &…
wawawa
  • 2,835
  • 6
  • 44
  • 105
0
votes
1 answer

SparkNLP Text classification using BertSentenceEmbeddings

I am struggling with implementing classification usecase using the BertSentenceEmbeddings in python. Mostly I get classNotFoundError and I think I am unable to figure out the right versions of libraries (spark-nlp, pyspark). I followed most of…
Rahul Sharma
  • 5,614
  • 10
  • 57
  • 91