0

I want to install Spark-NLP on Apache Spark Pools on Azure Synapse Analytics.

I added the spark_nlp-4.4.0-py2.py3-none-any.whl & spark-nlp_2.12-4.4.0.jar as workspace packages.

Workspace configuration runs without errors and can import SparkNLP throug the notebook.

It throws an error when importing a pretrained BERT model through following code.

import sparknlp
from sparknlp.base import * 
from sparknlp.pretrained import *
from sparknlp.pretrained import PretrainedPipeline
from pyspark.ml import Pipeline
from sparknlp.annotator import BertEmbeddings

bert = BertEmbeddings.pretrained("distilbert_base_uncased")

The error I'm getting is :

distilbert_base_uncased download started this may take some time.
Approximate size to download 236 MB
[ \ ]
An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel.
: java.lang.ClassCastException: com.johnsnowlabs.nlp.embeddings.DistilBertEmbeddings cannot be cast to com.johnsnowlabs.nlp.embeddings.BertEmbeddings
    at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1(ParamsAndFeaturesReadable.scala:50)
    at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1$adapted(ParamsAndFeaturesReadable.scala:49)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.onRead(ParamsAndFeaturesReadable.scala:49)
    at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1(ParamsAndFeaturesReadable.scala:61)
    at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1$adapted(ParamsAndFeaturesReadable.scala:61)
    at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:38)
    at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:24)
    at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:531)
    at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:523)
    at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.downloadModel(ResourceDownloader.scala:751)
    at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel(ResourceDownloader.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:750)
[OK!]
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)

What would be the correct procedure to install Spark NLP on Azure Synapse Apache Spark pools?

Abdennacer Lachiheb
  • 4,388
  • 7
  • 30
  • 61
  • This is the part of the error that matters `com.johnsnowlabs.nlp.embeddings.DistilBertEmbeddings cannot be cast to com.johnsnowlabs.nlp.embeddings.BertEmbeddings` I'm guessing there is some version incompatibility between Synapse and this library – Nick.Mc Apr 24 '23 at 09:38
  • or.... the model you are trying to load is the wrong version for the library you are trying to load. ie.. DistilBertEmbeddings vs BertEmbeddings – Nick.Mc Apr 24 '23 at 09:41

0 Answers0