1

I've set up a pyspark session and provided it specific configuration settings based off what I've read:

self.spark_session = SparkSession.builder.appName(
            "Example Session"
        ).config("spark.jars", "../../.jars/spark-bigquery-with-dependencies_2.13-0.28.0.jar")\
            .config("spark.hadoop.fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")\
         .config("spark.driver.extraClassPath", "../../.jars/gcs-connector-hadoop3-latest.jar")\
         .config("spark.executor.extraClassPath", "../../.jars/gcs-connector-hadoop3-latest.jar").getOrCreate()

and I'm able to work with the dataset I pull in just fine, transforming the data and the like. It's when I try to write to GCS to eventually write to BigQuery that I get an error:

dataframe.write.format("bigquery").option("temporaryGcsBucket", bucket_path).save(table_name)

The error I receive is:

E                   py4j.protocol.Py4JJavaError: An error occurred while calling o65.json.
E                   : java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: com.google.cloud.spark.bigquery.BigQueryRelationProvider Unable to get public no-arg constructor
E                       at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:582)
E                       at java.base/java.util.ServiceLoader.getConstructor(ServiceLoader.java:673)
E                       at java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNextService(ServiceLoader.java:1233)
E                       at java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNext(ServiceLoader.java:1265)
E                       at java.base/java.util.ServiceLoader$2.hasNext(ServiceLoader.java:1300)
E                       at java.base/java.util.ServiceLoader$3.hasNext(ServiceLoader.java:1385)
E                       at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
E                       at scala.collection.Iterator.foreach(Iterator.scala:943)
E                       at scala.collection.Iterator.foreach$(Iterator.scala:943)
E                       at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
E                       at scala.collection.IterableLike.foreach(IterableLike.scala:74)
E                       at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
E                       at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
E                       at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:303)
E                       at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:297)
E                       at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108)
E                       at scala.collection.TraversableLike.filter(TraversableLike.scala:395)
E                       at scala.collection.TraversableLike.filter$(TraversableLike.scala:395)
E                       at scala.collection.AbstractTraversable.filter(Traversable.scala:108)
E                       at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657)
E                       at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:725)
E                       at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:207)
E                       at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:361)
E                       at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
E                       at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
E                       at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
E                       at java.base/java.lang.reflect.Method.invoke(Method.java:566)
E                       at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
E                       at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
E                       at py4j.Gateway.invoke(Gateway.java:282)
E                       at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
E                       at py4j.commands.CallCommand.execute(CallCommand.java:79)
E                       at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
E                       at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
E                       at java.base/java.lang.Thread.run(Thread.java:829)
E                   Caused by: java.lang.NoClassDefFoundError: scala/$less$colon$less
E                       at java.base/java.lang.Class.getDeclaredConstructors0(Native Method)
E                       at java.base/java.lang.Class.privateGetDeclaredConstructors(Class.java:3137)
E                       at java.base/java.lang.Class.getConstructor0(Class.java:3342)
E                       at java.base/java.lang.Class.getConstructor(Class.java:2151)
E                       at java.base/java.util.ServiceLoader$1.run(ServiceLoader.java:660)
E                       at java.base/java.util.ServiceLoader$1.run(ServiceLoader.java:657)
E                       at java.base/java.security.AccessController.doPrivileged(Native Method)
E                       at java.base/java.util.ServiceLoader.getConstructor(ServiceLoader.java:668)
E                       ... 33 more
E                   Caused by: java.lang.ClassNotFoundException: scala.$less$colon$less
E                       at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
E                       at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
E                       at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
E                       ... 41 more


I've looked everywhere but unsure how to resolve this. It looks like I'm missing another .jar file at best guess, but not sure which one.

OpenDataAlex
  • 1,375
  • 5
  • 19
  • 39
  • 1
    This is possibly the spark runtime version you are running does not support gcs connector, Is it okay to downgrade it to spark runtime version 1.1 ? Here is the documentation for the supported libraries per version: https://cloud.google.com/dataproc-serverless/docs/concepts/versions/spark-runtime-versions#spark_runtime_version_20 – Nestor Ceniza Jr Feb 17 '23 at 23:17

1 Answers1

0

The issue was a missing jar file. After adding in the correct jar (as mentioned in this SO question) it appears to be working.

OpenDataAlex
  • 1,375
  • 5
  • 19
  • 39