Highest Voted 'johnsnowlabs-spark-nlp' Questions

0

votes

1 answer

Analysis Exception in spark NLP

strong textBelow was my code block: conll_data.select(F.explode(F.arrays_zip('token.result','label.result')).alias("cols")) \ .select(F.expr("cols['0']").alias("token"), F.expr("cols['1']").alias("ground_truth"))\ …

apache-spark pyspark apache-spark-sql johnsnowlabs-spark-nlp

asked Jun 09 '22 at 06:48

Tokvl

11
2

0

votes

1 answer

SparkNLP PipelineModel which includes AnnotatorApproach in stages

In a SparkNLP's PipelineModel all the stages have to be of type AnnotatorModel. But what if one of those annotatormodels requires a certain column in the dataset as input and this input column is the output of an AnnotatorApproach? For instance, I…

java apache-spark nlp johnsnowlabs-spark-nlp

asked Nov 18 '21 at 19:31

martin_wun

1,599
1
15
33

0

votes

1 answer

sparkNLP Tokenization of Contractions

I'm using sparkNLP version 3.2.3 and trying to tokenize some text. I've used spacy and other tokenizers that handle contractions such as "they're" by splitting it into "they" and "'re". According to this resource pages 105-107 sparkNLP should…

tokenize johnsnowlabs-spark-nlp

asked Oct 19 '21 at 17:02

user3242036

645
1
7
16

0

votes

2 answers

SparkNLP's NerCrfApproach with custom labels

I am trying to train a SparkNLP NerCrfApproach model with a dataset in CoNLL format that has custom labels for product entities (like I-Prod, B-Prod etc.). However, when using the trained model to make predictions, I get only "O" as the assigned…

named-entity-recognition johnsnowlabs-spark-nlp

asked Oct 13 '21 at 07:33

martin_wun

1,599
1
15
33

0

votes

1 answer

Mix Smark MLLIB and SparkNLP in pipeline

In a MLLIB pipeline, how can I chain a CountVectorizer (from SparkML) after a Stemmer (from Spark NLP) ? When I try to use both in a pipeline I get: myColName must be of type equal to one of the following types: [array, array] but…

scala apache-spark apache-spark-mllib johnsnowlabs-spark-nlp

asked Oct 07 '21 at 17:29

Benjamin

3,350
4
24
49

0

votes

1 answer

Loading large sparknlp pipeline into Apache Spark batch job taking too long

I am using SparkNLP from johnsnowlabs for extracting embeddings from my textual data, below is the pipeline. The size of the model is 1.8g after saving to hdfs embeddings = BertSentenceEmbeddings.pretrained("labse", "xx") \ …

apache-spark hadoop hadoop-yarn johnsnowlabs-spark-nlp

asked Jun 07 '21 at 10:30

Danial Shabbir

612
8
18

0

votes

1 answer

Glue job failed with `JohnSnowLabs spark-nlp dependency not found` error randomly

I'm using AWS Glue to run some pyspark python code, sometimes it succeeded but sometimes failed with a dependency error: Resource Setup Error: Exception in thread "main" java.lang.RuntimeException: [unresolved dependency:…

java amazon-web-services apache-spark aws-glue johnsnowlabs-spark-nlp

asked May 06 '21 at 08:37

wawawa

2,835
6
44
105

0

votes

1 answer

py4j.protocol.Py4JNetworkError: Answer from Java side is empty

This is the code I am using on Google Colab. It keeps getting stuck at the model.fit part and throws this exception. I haven't been able to find any solutions for it anywhere. The memory also seems to get very high on Colab, starting to think…

python apache-spark pyspark countvectorizer johnsnowlabs-spark-nlp

asked Apr 11 '21 at 04:45

Frying Pan

167
2
8

0

votes

1 answer

How to use `LanguageDetectorDL` spark NLP on pyspark column?

I am working with pyspark dataframe. I have df that looks like this: df.select('words').show(5, truncate = 130) +----------------------------------------------------------------------------------------------------------------------------------+ | …

python apache-spark pyspark johnsnowlabs-spark-nlp

asked Apr 03 '21 at 16:56

Samiksha

59
6

0

votes

1 answer

Spark-NLP functions give pickling error when using map

I have an RDD of the following structure: my_rdd = [Row(text='Hello World. This is bad.'), Row(text='This is good.'), ...] I can perform parallel processing with python functions: rdd2=my_rdd.map(lambda f: f.text.split()) for x in rdd2.collect(): …

apache-spark pyspark rdd johnsnowlabs-spark-nlp

asked Mar 27 '21 at 19:16

Brian131

1

0

votes

0 answers

How to use multiple clean up patterns in Normalizer (spark nlp)?

I am working with pyspark dataframe. I need to perform tf-idf and for that I am used prior steps of tokenizing, normalization, etc using spark NLP. I have df that looks like this after applying tokenizer: df.select('tokenizer').show(5, truncate =…

python apache-spark pyspark nlp johnsnowlabs-spark-nlp

asked Mar 27 '21 at 11:16

Samiksha

59
6

0

votes

1 answer

Error when load Spark-nlp pretrainedPipeline

One exception occurred when I load a spark nlp pretrainedPipeline as following: Exception in thread "main" java.lang.IllegalArgumentException: Unsupported class file major version 59 I am new to Scala, can anyone recognize the reason? Thank you in…

scala apache-spark apache-spark-sql johnsnowlabs-spark-nlp

asked Mar 11 '21 at 19:09

Jerry Sun

1
1

0

votes

0 answers

pyspark.sql.utils.IllegalArgumentException: 'requirement failed: Was not found appropriate resource to download for request

I'm trying to run the example code below: import sparknlp sparknlp.start() from sparknlp.pretrained import PretrainedPipeline explain_document_pipeline = PretrainedPipeline("explain_document_ml") annotations =…

java pyspark apache-spark-sql pycharm johnsnowlabs-spark-nlp

asked Feb 05 '21 at 15:25

wawawa

2,835
6
44
105

0

votes

1 answer

java.lang.ClassNotFoundException: com.johnsnowlabs.nlp.DocumentAssembler spark in Pycharm with conda env

I saved a pre-trained model from spark-nlp, then I'm trying to run a Python script in Pycharm with anaconda env: Model_path = "./xxx" model = PipelineModel.load(Model_path) But I got the following error: (I tried with pyspark 2.4.4 &…

java python apache-spark pyspark johnsnowlabs-spark-nlp

asked Feb 05 '21 at 13:45

wawawa

2,835
6
44
105

0

votes

1 answer

SparkNLP Text classification using BertSentenceEmbeddings

I am struggling with implementing classification usecase using the BertSentenceEmbeddings in python. Mostly I get classNotFoundError and I think I am unable to figure out the right versions of libraries (spark-nlp, pyspark). I followed most of…

pyspark apache-spark-mllib johnsnowlabs-spark-nlp

asked Dec 10 '20 at 14:46

Rahul Sharma

5,614
10
57
91

Questions tagged [johnsnowlabs-spark-nlp]