ClassNotFoundException with Apache Spark and assembled Jar

Question

I'm trying to run a Java project that uses Apache Spark. I read my data from CSV files into a dataset. If I run the code from Eclipse, everyhing works fine. I configured the project such that a single jar with all dependencies is made. If I run the jar file using java -jar ..., this happens:

exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: csv. Please find packages at http://spark.apache.org/third-party-projects.html
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:635)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:190)
    at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:594)
    at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:473)
    at access.DocumentsSparkAccess.getInstance(DocumentsSparkAccess.java:32)
    at process.TopicModelCreator.<init>(TopicModelCreator.java:38)
    at main.Main.createTopicModel(Main.java:56)
    at main.Main.main(Main.java:37)
Caused by: java.lang.ClassNotFoundException: csv.DefaultSource
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$15.apply(DataSource.scala:618)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$15.apply(DataSource.scala:618)
    at scala.util.Try$.apply(Try.scala:192)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:618)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:618)
    at scala.util.Try.orElse(Try.scala:84)
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:618)

I use the following versions:

Java 1.8
Apache Spark 2.3.0

I use the Maven assembly plugin like this:

<plugin>
  <artifactId>maven-assembly-plugin</artifactId>
  <version>3.1.0</version>
  <configuration>
    <archive>
      <manifest>
        <mainClass>main.Main</mainClass>
      </manifest>
    </archive>
    <descriptorRefs>
      <descriptorRef>jar-with-dependencies</descriptorRef>
    </descriptorRefs>
  </configuration>
  <executions>
    <execution>
      <phase>package</phase>
      <goals>
        <goal>single</goal>
      </goals>
    </execution>
  </executions>
</plugin>

The dependencies are included like this:

<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-core_2.11</artifactId>
  <version>2.3.0</version>
</dependency>
<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-mllib_2.11</artifactId>
  <version>2.3.0</version>
</dependency>

Answer:

This problem has been solved already for Parquet files: "Failed to find data source: parquet" when making a fat jar with maven

I found this https://stackoverflow.com/questions/62232209/classnotfoundexception-caused-by-java-lang-classnotfoundexception-csv-default but it does not use maven — Litchy, Dec 14 '21 at 02:54

ClassNotFoundException with Apache Spark and assembled Jar

0 Answers0