0

I am trying to run a spark LightGBMRegressor conncting to databricks with databricks-connect using pycharm.
when trying to "fit" my data I get an error NoClassDefFoundError: spray/json/JsonWriter.
code i am trying to run:

if "DATABRICKS_RUNTIME_VERSION" not in os.environ:
    from pyspark.sql import SparkSession
    from pyspark.dbutils import DBUtils
    spark = SparkSession.builder \
            .config("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:0.9.5") \
            .config("spark.jars.repositories", "https://mmlspark.azureedge.net/maven") \
            .getOrCreate()
from pyspark.ml.evaluation import RegressionEvaluator
train_data = featurizer.transform(x_trn)[experiment.config.target_col, 'features']
test_data = featurizer.transform(x_tst)[experiment.config.target_col, 'features']
train_data.groupBy(experiment.config.target_col)
model = splightgbm.LightGBMRegressor(
    numIterations=500,
    learningRate=0.05,
    featuresCol="features", labelCol=experiment.config.target_col
)
model.fit(
    train_data

)

this is the Traceback:

py4j.protocol.Py4JJavaError: An error occurred while calling o1676.fit.
: java.lang.NoClassDefFoundError: spray/json/JsonWriter
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:242)
    at org.apache.spark.sql.util.SparkServiceObjectInputStream.readResolveClassDescriptor(SparkServiceObjectInputStream.scala:60)
    at org.apache.spark.sql.util.SparkServiceObjectInputStream.readClassDescriptor(SparkServiceObjectInputStream.scala:55)``
eliavs
  • 2,306
  • 4
  • 23
  • 33
  • 1
    Looks like a missing dependency. If you have all the required versions (python, spark, etc.), I would try to add it manually from maven (https://search.maven.org/artifact/io.spray/spray-json_2.12/1.3.6/jar). – bzu Jul 17 '22 at 13:50
  • @bzu thanks is there a manual of adding it manually? – eliavs Jul 17 '22 at 13:54
  • It's a bit of a long shot, but you could add it to classpath: .config("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:0.9.5,io.spray:spray-json_2.12:1.3.6"). There are also some other parameters you could set, described here: https://github.com/microsoft/SynapseML#synapse-analytics – bzu Jul 17 '22 at 14:08

0 Answers0