Questions tagged [johnsnowlabs-spark-nlp]

John Snow Labs’ NLP is a natural language processing tool built on top of Apache Spark ML pipelines

External links

Related tags:

100 questions
2
votes
1 answer

Does John Snow Labs’ NLP library built on top of Apache Spark support Java

John Snow Labs’ NLP library built on top of Apache Spark and Spark ML library. All its examples are provided in scala and python. Does it support java? If yes where can I find the related guides? If not is there any plan to support java?
Mahesha999
  • 22,693
  • 29
  • 116
  • 189
1
vote
0 answers

SparkNLP messages disabled

When I run the code "spark = sparknlp.start(), it always returns such a message to the terminal, which is very annoying. 23/07/13 14:23:33 WARN Utils: Your hostname, ---------- resolves to a loopback address: -------; using 192.168.28.181 instead…
1
vote
0 answers

How to get vocabulary from WordEmbeddingsModel in sparknlp

I need to create an embedding matrix from embeddings generated by WordEmbeddingsModel in sparknlp. Until now i have this code : from sparknlp.annotator import * from sparknlp.common import * from sparknlp.base import * # define sparknlp…
1
vote
0 answers

How to build a federated system with CSV dataset with SparkNL library?

I am very interested in federated systems and i was trying one of the pre trained multilingual models such as this notebook Multi_Lingual_Training_and_models. I was looking for any tutorials using TFF or Flower frameworks that handle csv…
1
vote
1 answer

Remove the repeated punctuation from pyspark dataframe

I need to remove the repeated punctuations and keep the last occurrence only. For example: !!!! -> ! !!$$ -> !$ I have a dataset that looks like below temp = spark.createDataFrame([ (0, "This is Spark!!!!"), (1, "I wish Java…
merkle
  • 1,585
  • 4
  • 18
  • 33
1
vote
1 answer

How to start Spark session on Vertex AI workbench Jupyterlab notebook?

Can you kindly show me how do we start the Spark session on Google Cloud Vertex AI workbench Jupyterlab notebook? This is working fine in Google Colaboratory by the way. What is missing here? # Install Spark NLP from PyPI !pip install -q…
1
vote
0 answers

Py4JJavaError: An error ocurred while caling z:comjohnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize: java.lang.NoClassDefFoundError

Unable to load pre-trained model on both window and linux. set environment variables for all spark 3.2.1 ,hadoop 3.2, sparknlp 3.4.4 and 4.0.0 used both, python 3.8+ and using java 8. Need help from sparknlp.base import * from sparknlp.annotator…
1
vote
1 answer

trying to use johnsnow pretrained pipeline on spark dataframe but unable to read delta file in the same session

i am using the below code to read the spark dataframe from hdfs: from delta import * from pyspark.sql import SparkSession builder= SparkSession.builder.appName("MyApp") \ .config("spark.sql.extensions",…
1
vote
0 answers

KeyError: 'PYSPARK_GATEWAY_SECRET' when creating spark context inside aws lambda code

I have deployed a lambda function which uses sparknlp, as a docker container. For working with sparknlp I need spark context. So, In my sparknlp code, I start with sc = pyspark.SparkContext().getOrCreate() I tested my lambda on local and it worked…
1
vote
0 answers

How to load a SPARK NLP pretrained pipeline through HDFS

I've already installed sparknlp and its assembly jars, but I still get an error when I try to use one of the models, I get a TypeError: 'JavaPackage' object is not callable. I cannot install the model and load it from disk because it's considered…
LucasA
  • 11
  • 2
1
vote
1 answer

Converting Spacy NER entity format to CONLL 2003 format

I am working on NER application where i have data annotated in the following data format. [('The F15 aircraft uses a lot of fuel', {'entities': [(4, 7, 'aircraft')]}), ('did you see the F16 landing?', {'entities': [(16, 19, 'aircraft')]}), ('how…
imhans33
  • 133
  • 11
1
vote
1 answer

How to use NER model fine tuned using hugging face transformers with spark nlp on databricks

I needed to train (fine tune) NER token classifier to recognize our custom tokens. The easiest way to do that I found was: Token Classification with W-NUT Emerging Entities But now I encountered a problem - the plan was to follow: HuggingFace in…
1
vote
1 answer

Regex in Spark NLP Normalizer is not working correctly

I'm using the Spark NLP pipeline to preprocess my data. Instead of only removing punctuation, the normalizer also removes umlauts. My code: documentAssembler = DocumentAssembler() \ .setInputCol("column") \ .setOutputCol("column_document")\ …
1
vote
1 answer

Can't export data from a spark dataframe

I parsed 500k tweets as a test using spark NLP. The dataframe looks fine. I converted the arrays to a string. Using from pyspark.sql.functions import udf from pyspark.sql.types import StringType def array_to_string(my_list): return '[' +…
Viktor Avdulov
  • 127
  • 2
  • 14
1
vote
0 answers

SparkNLP NerDLModel load throws NoSuchMethodException

I am currently using John Snow labs SparkNLP library to train a custom NER Model. I am able to successfully complete the training and the model is getting saved to the disk. When I try to load the model for next step to actually tag some sample data…