Questions tagged [johnsnowlabs-spark-nlp]

John Snow Labs’ NLP is a natural language processing tool built on top of Apache Spark ML pipelines

External links

Related tags:

100 questions
2
votes
2 answers

Local data cannot be read in a Dataproc cluster, when using SparkNLP

I am trying to build a Dataproc cluster, with Spark NLP installed in it, then quick test it by reading some CoNLL 2003 data. First, I used this codelab as inspiration, to build my own smaller cluster (project name has been edited for safety…
2
votes
0 answers

Generate a summarizing word based on a set of words

I'm very new to NLP, so I have some theoretical question. Let's say I have the following Spark dataframe: +--+------------------------------------------+ |id| …
Hilary
  • 475
  • 3
  • 10
2
votes
0 answers

I am getting a TypeError: 'JavaPackage' object is not callable when trying to perform DocumentAssembler() in google colab

While trying to call the DocumentAssembler() in google colab, I am getting the above error. I have used '!wget http://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 2.4.5 -s 2.6.5' for setup. I have looked into the available solutions on…
2
votes
1 answer

Spark NLP is not working in PySpark: TypeError: 'JavaPackage' object is not callable

I'm trying to spark-submit a PySpark application but every time I try it throws this error when it tries to download a pre-trained model from Spark NLP: TypeError: 'JavaPackage' object is not callable Any idea what might be causing this? Also, it's…
Doraemon
  • 315
  • 1
  • 10
2
votes
1 answer

Is it possible to use the library Spark-NLP with Spark Structured Streaming?

I want to perform tweets sentiment analysis on a stream of messages I get from a Kafka cluster that, in turn, gets the tweets from the Twitter API v2. When I try to apply the pre-trained sentiment analysis pipeline I get an error message saying:…
2
votes
0 answers

java.lang.VerifyError: Bad return type Reason: Type 'java/lang/Object' (current frame, stack[0]) is not assignable to 'org/tensorflow/Tensor'

I want to run sparknlp in python, I am using apache-spark 3.2.1, spark-nlp==3.4.1 pyspark==3.1.2. I am following this guide. I am able to get the spark session using this code : sc = pyspark.SparkContext().getOrCreate() import…
2
votes
1 answer

How to load SparkNLP offline model in python

I need to use sparknlp to do lemmatization in python, i want to use the pretrained pipeline, however need to do it offline. what is the correct way to do this? i am not able to find any python example. I am passing token as the inputcol for…
Fiona
  • 85
  • 1
  • 7
2
votes
1 answer

Spark-nlp Pretrained-model not loading in windows

I am trying to install pretrained pipelines in spark-nlp in windows 10 with python. The following is the code I have tried so far in the Jupyter notebook in the local system: ! java -version # should be Java 8 (Oracle or OpenJDK) ! conda create -n…
2
votes
1 answer

BERT embeddings in SPARKNLP or BERT for token classification in huggingface

Currently I am working on productionize a NER model on Spark. I have a current implementation that is using Huggingface DISTILBERT with the TokenClassification head, but as the performance is a bit slow and costly, I am trying to find ways to…
2
votes
1 answer

multilingual bert in spark nlp

I was wondering if pre-trained multilingual Bert is available in sparknlp? As you know Bert is pre-trained for 109 languages. I was wondering if all of these languages are in spark bert too? Thanks
2
votes
1 answer

NLP analysis for some pyspark dataframe columns by numpy vectorization

I would like to do some NLP analysis for a string column in pyspark dataframe. df: year month u_id rating_score p_id review 2010 09 tvwe 1 p_5 I do not like it because its size is not for me. 2011 11 frsa 1 p_7 I…
user3448011
  • 1,469
  • 1
  • 17
  • 39
2
votes
1 answer

Spark-nlp: can't load pretrained recognize entity model from disk in pyspark

I have a spark cluster set up and would like to integrate spark-nlp to run named entity recognition. I need to access the model from disk rather than download it from the internet at runtime. I have downloaded the recognize_entities_dl model from…
2
votes
0 answers

How to identify main entity (category) if query contain multiple category

I want to extract out the key intent of user by identify the key category from the probable category identified by some process. E.g. Christmas tree ornament Above query has 2 category in it 1) Christmas tree 2) ornament Actual intent lies in…
Aman Tandon
  • 1,379
  • 2
  • 13
  • 26
2
votes
0 answers

Version Compatibility issues with Scala, Spark, Spark NLP

I am new to 'Spark NLP' and I got stuck in version compatibility issues only. That may seems to be silly but still I request you to help me in this: ‘Spark NLP’ is built on top of Apache Spark 2.4.0 and such is the only supported release (mentioned…
amandeep1991
  • 1,344
  • 13
  • 17
2
votes
1 answer

Can't get the johnsnow OCR notebook run on databricks

So I am trying to follow this notebook and get it to work on a databricks notebook: https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/ocr-spell/OcrSpellChecking.ipynb ; However, after installing all the packages, I still get…
Kay
  • 59
  • 1
  • 5