0

I want to connect my Spark cluster to TIDB by TiSpark but I got a problem when I run my Spark application, an error occur:

java.io.InvalidClassException: com.pingcap.tikv.region.TiRegion; local class incompatible: stream classdesc serialVersionUID = -3091715739322916126, local class serialVersionUID = -3556238418089320368

I'm setting up a TIDB cluster follow the guide at https://pingcap.com/docs/v3.0/how-to/get-started/deploy-tidb-from-binary/

After that I follow the guide at https://pingcap.com/docs/v3.0/reference/tispark/ to download tispark-core-2.2.0-SNAPSHOT-jar-with-dependencies.jar and copy it to my jars folder in Spark.

I also config: spark.tispark.pd.addresses 127.0.0.1:2379 spark.sql.extensions org.apache.spark.sql.TiExtensions

Here is my pom file:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.vng</groupId>
  <artifactId>testlan102</artifactId>
  <version>1.0-SNAPSHOT</version>
  <inceptionYear>2019</inceptionYear>
  <properties>
    <scala.version>2.11.12</scala.version>
    <spark.version>2.4.3</spark.version>
  </properties>

  <repositories>
    <repository>
      <id>scala-tools.org</id>
      <name>Scala-Tools Maven2 Repository</name>
      <url>http://scala-tools.org/repo-releases</url>
    </repository>
  </repositories>

  <pluginRepositories>
    <pluginRepository>
      <id>scala-tools.org</id>
      <name>Scala-Tools Maven2 Repository</name>
      <url>http://scala-tools.org/repo-releases</url>
    </pluginRepository>
  </pluginRepositories>

  <dependencies>
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>${scala.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.11</artifactId>
      <version>2.4.3</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.11</artifactId>
      <version>2.4.3</version>
    </dependency>
    <dependency>
      <groupId>org.scala-lang.modules</groupId>
      <artifactId>scala-xml_2.11</artifactId>
      <version>1.2.0</version>
    </dependency>

    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming_2.11</artifactId>
      <version>2.4.3</version>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.kafka</groupId>
      <artifactId>kafka-streams</artifactId>
      <version>2.3.0</version>
    </dependency>
    <dependency>
      <groupId>com.pingcap.tispark</groupId>
      <artifactId>tispark-core</artifactId>
      <version>2.1.1-spark_2.4</version>
    </dependency>
    <dependency>
      <groupId>mysql</groupId>
      <artifactId>mysql-connector-java</artifactId>
      <version>5.1.37</version>
    </dependency>

    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>2.11.12</version>
    </dependency>
  </dependencies>

  <build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <plugins>
      <plugin>
        <groupId>org.scala-tools</groupId>
        <artifactId>maven-scala-plugin</artifactId>
        <executions>
          <execution>
            <goals>
              <goal>compile</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <scalaVersion>${scala.version}</scalaVersion>
          <args>
            <arg>-target:jvm-1.5</arg>
          </args>
        </configuration>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-eclipse-plugin</artifactId>
        <configuration>
          <downloadSources>true</downloadSources>
          <buildcommands>
            <buildcommand>ch.epfl.lamp.sdt.core.scalabuilder</buildcommand>
          </buildcommands>
          <additionalProjectnatures>
            <projectnature>ch.epfl.lamp.sdt.core.scalanature</projectnature>
          </additionalProjectnatures>
          <classpathContainers>
            <classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer>
            <classpathContainer>ch.epfl.lamp.sdt.launching.SCALA_CONTAINER</classpathContainer>
          </classpathContainers>
        </configuration>
      </plugin>
    </plugins>
  </build>
  <reporting>
    <plugins>
      <plugin>
        <groupId>org.scala-tools</groupId>
        <artifactId>maven-scala-plugin</artifactId>
        <configuration>
          <scalaVersion>${scala.version}</scalaVersion>
        </configuration>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-jar-plugin</artifactId>
        <configuration>
          <archive>
            <manifest>
              <addClasspath>true</addClasspath>
              <mainClass>fully.qualified.MainClass</mainClass>
            </manifest>
          </archive>
        </configuration>
      </plugin>
      <plugin>
        <artifactId>maven-assembly-plugin</artifactId>
        <configuration>
          <descriptorRefs>
            <descriptorRef>jar-with-dependencies</descriptorRef>
          </descriptorRefs>
        </configuration>
      </plugin>
    </plugins>
  </reporting>
</project>

My Spark Session is:

val _spark = SparkSession.builder()
      .master("spark://127.0.0.1:7077")
      .config("spark.tispark.pd.addresses", "127.0.0.1:2379")
      .config("spark.sql.extensions","org.apache.spark.sql.TiExtensions")
      .appName("SparkApp")
      .getOrCreate()

When I call a simple query to database:

_spark.sql("use locdb")
val df = _spark.sql("select * from bang")
df.show()

I got an error: java.io.InvalidClassException: com.pingcap.tikv.region.TiRegion; local class incompatible: stream classdesc serialVersionUID = -3091715739322916126, local class serialVersionUID = -3556238418089320368

My full log is here: https://gist.github.com/lploc94/bb6bf9db14c030ee123630f6362f6160

I think the reason is I using TiSpark 2.1.1-2.4 in maven pom file but the Tispark jar file I download and copy to jars folder is 2.2.0. But I cant see any other version of TiSpark like tispark-core-2.1.1-SNAPSHOT-jar-with-dependencies.jar

1 Answers1

0

I'm a dev of tispark. Yes your educated guess is correct :). There are two different versions of tispark jars during your run which caused problem. Version 2.2 (in your cluster env) is not officially released to maven repo artifacts yet.

Since you already have tispark jars deployed in your cluster you can just remove tispark dependency in your pom. In most of the cases you don't need any special api from tispark unless you are using older version (< 2.0) and you still can query tidb directly.

Or you might remove all jars in your cluster environment and rely on tispark in your pom (if so, please pack it with dependencies).

user1192878
  • 704
  • 1
  • 10
  • 20
  • Thanks for your reply! I have remove tispark on my pom file, it have another errors. It seem your API doesnt work with spark 2.4.3, when I downgrade Spark version to 2.3.3, I see it can connect well. – Thorn Honor Jul 26 '19 at 06:23