In a SparkNLP's PipelineModel all the stages have to be of type AnnotatorModel. But what if one of those annotatormodels requires a certain column in the dataset as input and this input column is the output of an AnnotatorApproach?
For instance, I…
I'm using sparkNLP version 3.2.3 and trying to tokenize some text. I've used spacy and other tokenizers that handle contractions such as "they're" by splitting it into "they" and "'re". According to this resource pages 105-107 sparkNLP should…
I am trying to train a SparkNLP NerCrfApproach model with a dataset in CoNLL format that has custom labels for product entities (like I-Prod, B-Prod etc.). However, when using the trained model to make predictions, I get only "O" as the assigned…
In a MLLIB pipeline, how can I chain a CountVectorizer (from SparkML) after a Stemmer (from Spark NLP) ?
When I try to use both in a pipeline I get:
myColName must be of type equal to one of the following types: [array, array] but…
I am using SparkNLP from johnsnowlabs for extracting embeddings from my textual data, below is the pipeline. The size of the model is 1.8g after saving to hdfs
embeddings = BertSentenceEmbeddings.pretrained("labse", "xx") \
…
I'm using AWS Glue to run some pyspark python code, sometimes it succeeded but sometimes failed with a dependency error: Resource Setup Error: Exception in thread "main" java.lang.RuntimeException: [unresolved dependency:…
This is the code I am using on Google Colab. It keeps getting stuck at the model.fit part and throws this exception. I haven't been able to find any solutions for it anywhere. The memory also seems to get very high on Colab, starting to think…
I am working with pyspark dataframe.
I have df that looks like this:
df.select('words').show(5, truncate = 130)
+----------------------------------------------------------------------------------------------------------------------------------+
| …
I have an RDD of the following structure:
my_rdd = [Row(text='Hello World. This is bad.'), Row(text='This is good.'), ...]
I can perform parallel processing with python functions:
rdd2=my_rdd.map(lambda f: f.text.split())
for x in rdd2.collect():
…
I am working with pyspark dataframe. I need to perform tf-idf and for that I am used prior steps of tokenizing, normalization, etc using spark NLP.
I have df that looks like this after applying tokenizer:
df.select('tokenizer').show(5, truncate =…
One exception occurred when I load a spark nlp pretrainedPipeline as following:
Exception in thread "main" java.lang.IllegalArgumentException: Unsupported class file major version 59
I am new to Scala, can anyone recognize the reason? Thank you in…
I'm trying to run the example code below:
import sparknlp
sparknlp.start()
from sparknlp.pretrained import PretrainedPipeline
explain_document_pipeline = PretrainedPipeline("explain_document_ml")
annotations =…
I saved a pre-trained model from spark-nlp, then I'm trying to run a Python script in Pycharm with anaconda env:
Model_path = "./xxx"
model = PipelineModel.load(Model_path)
But I got the following error:
(I tried with pyspark 2.4.4 &…
I am struggling with implementing classification usecase using the BertSentenceEmbeddings in python. Mostly I get classNotFoundError and I think I am unable to figure out the right versions of libraries (spark-nlp, pyspark).
I followed most of…