1

While using kafka and delta_core dependencies in a spark project I'm receiving the next warning:

[WARNING] delta-core_2.12-0.7.0.jar, spark-sql-kafka-0-10_2.12-3.1.1.jar define 1 overlapping resources: 
[WARNING]   - META-INF/services/org.apache.spark.sql.sources.DataSourceRegister

Which causes delta source to not be found. How can I include both delta and kafka? Thanks.

Here is my maven config:

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql-kafka-0-10_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>io.delta</groupId>
            <artifactId>delta-core_${scala.version}</artifactId>
            <version>0.7.0</version>
        </dependency>
...
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>3.2.2</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <transformers>
                                <transformer
                                        implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>
                                        META-INF/services/org.apache.spark.sql.sources.DataSourceRegister
                                    </resource>
                                </transformer>
                            </transformers>
                            <finalName>${project.artifactId}-${project.version}</finalName>
                            <artifactSet>
                                <includes>
                                    <include>org.scalactic:*</include>
                                    <include>io.delta:*</include>
                                    <include>org.apache.spark:*</include>
                                </includes>
                            </artifactSet>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                </filter>
                            </filters>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
    ```
B. Bal
  • 121
  • 1
  • 2
  • 11

2 Answers2

0

I solved it. My problem was that I was using both maven-shade and maven-assembly plugins. Removing maven-assembly plugin worked!

B. Bal
  • 121
  • 1
  • 2
  • 11
0

To extend B. Bal answer, in case anyone is using Sbt instead of Maven, the problem may be fixed by changing the assembly merge strategy:

 assembly / assemblyMergeStrategy := {
  case PathList("META-INF", "services", xg @ _*) => MergeStrategy.concat
  case PathList("META-INF", xs @ _*)             => MergeStrategy.discard
  case x                                         => MergeStrategy.first
}

By using this merge strategy, the META-INF/services will not be overwritten, so the Delta source along with any other source will be available from your fat jar.

More details may be found in this threat

MCardus
  • 38
  • 5