0

I am getting this error when running the Spotify Spark Bigquery connector on Qubole data platform. I do see the BigQueryUtils class in my jar but still it throws this error:

Exception in thread "main" org.spark-project.guava.util.concurrent.ExecutionError: java.lang.NoSuchMethodError: com.google.cloud.hadoop.io.bigquery.BigQueryUtils.waitForJobCompletion

Attaching the pom below...

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.xyz.abc.google.TestProject</groupId>
  <artifactId>edesem-google-TestProject</artifactId>
  <version>0.0.1-SNAPSHOT</version>

  <properties>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

    <gpg.skip>true</gpg.skip>

    <!-- Keep in sync with google-api-client dependency -->
    <apache.httpcomponents.version>4.0.1</apache.httpcomponents.version>
  </properties>

  <build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <plugins>
      <plugin>
        <groupId>net.alchim31.maven</groupId>
        <artifactId>scala-maven-plugin</artifactId>
        <version>3.3.1</version>
        <executions>
          <execution>
            <goals>
              <goal>compile</goal>
            </goals>
          </execution>
        </executions>
      </plugin>

      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>3.5.1</version>
        <configuration>
          <source>1.8</source>
          <target>1.8</target>
        </configuration>
      </plugin>

      <plugin>
        <artifactId>maven-assembly-plugin</artifactId>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>single</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <descriptorRefs>
            <descriptorRef>jar-with-dependencies</descriptorRef>
          </descriptorRefs>
        </configuration>
      </plugin>

      <!-- Maven Shade Plugin -->
      <plugin>
        <artifactId>maven-shade-plugin</artifactId>
        <version>2.4</version>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
            <configuration>
              <finalName>edesem-google-TestProject</finalName>
              <shadedArtifactAttached>false</shadedArtifactAttached>
              <artifactSet>
                <includes>
                  <include>*:*</include>
                </includes>
              </artifactSet>
              <filters>
                <filter>
                  <artifact>*:*</artifact>
                  <excludes>
                    <exclude>META-INF/*.SF</exclude>
                    <exclude>META-INF/*.DSA</exclude>
                    <exclude>META-INF/*.RSA</exclude>
                  </excludes>
                </filter>
              </filters>
              <transformers>
                <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" />
                <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                  <resource>reference.conf</resource>
                </transformer>
                <transformer implementation="org.apache.maven.plugins.shade.resource.DontIncludeResourceTransformer">
                  <resource>log4j.properties</resource>
                </transformer>
                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                  <mainClass>com.xyz.abc.bigquery.TestProjectBQClient</mainClass>
                </transformer>
              </transformers>
              <relocations>
                <relocation>
                  <pattern>org.eclipse.jetty</pattern>
                  <shadedPattern>org.spark-project.jetty</shadedPattern>
                  <includes>
                    <include>org.eclipse.jetty.**</include>
                  </includes>
                </relocation>
                <relocation>
                  <pattern>com.google.common</pattern>
                  <shadedPattern>org.spark-project.guava</shadedPattern>
                  <excludes>
                    <exclude>com/google/common/base/Absent*</exclude>
                    <exclude>com/google/common/base/Function</exclude>
                    <exclude>com/google/common/base/Optional*</exclude>
                    <exclude>com/google/common/base/Present*</exclude>
                    <exclude>com/google/common/base/Supplier</exclude>
                  </excludes>
                </relocation>
              </relocations>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>

  <dependencies>
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>2.10.6</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.10</artifactId>
      <version>2.2.0</version>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.10</artifactId>
      <version>2.2.0</version>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>com.databricks</groupId>
      <artifactId>spark-avro_2.10</artifactId>
      <version>4.0.0</version>
    </dependency>
    <dependency>
      <groupId>com.google.cloud.bigdataoss</groupId>
      <artifactId>bigquery-connector</artifactId>
      <version>0.10.2-hadoop2</version>
      <exclusions>
        <exclusion>
          <groupId>com.google.guava</groupId>
          <artifactId>guava-jdk5</artifactId>
        </exclusion>
      </exclusions>
    </dependency>
    <dependency>
      <groupId>org.slf4j</groupId>
      <artifactId>slf4j-simple</artifactId>
      <version>1.7.21</version>
    </dependency>
    <dependency>
      <groupId>joda-time</groupId>
      <artifactId>joda-time</artifactId>
      <version>2.9.3</version>
    </dependency>
    <dependency>
      <groupId>org.scalatest</groupId>
      <artifactId>scalatest_2.10</artifactId>
      <version>2.2.1</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>com.google.cloud.bigdataoss</groupId>
      <artifactId>gcs-connector</artifactId>
      <version>1.8.0-hadoop2</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/com.google.cloud.bigdataoss/util-hadoop -->
    <dependency>
      <groupId>com.google.cloud.bigdataoss</groupId>
      <artifactId>util-hadoop</artifactId>
      <version>1.8.0-hadoop2</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/com.google.cloud.bigdataoss/gcsio -->
    <dependency>
      <groupId>com.google.cloud.bigdataoss</groupId>
      <artifactId>gcsio</artifactId>
      <version>1.8.0</version>
    </dependency>
    <dependency>
      <groupId>com.google.cloud.bigdataoss</groupId>
      <artifactId>util</artifactId>
      <version>1.8.0</version>
      <exclusions>
        <exclusion>
          <groupId>com.google.api-client</groupId>
          <artifactId>google-api-client-java6</artifactId>
        </exclusion>
        <exclusion>
          <groupId>com.google.guava</groupId>
          <artifactId>guava-jdk5</artifactId>
        </exclusion>
      </exclusions>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-mapreduce-client-core -->
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-mapreduce-client-core</artifactId>
      <version>2.8.3</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>2.8.3</version>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>com.google.guava</groupId>
      <artifactId>guava</artifactId>
      <version>23.6-jre</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/com.google.cloud/google-cloud-bigquery -->
    <dependency>
      <groupId>com.google.cloud</groupId>
      <artifactId>google-cloud-bigquery</artifactId>
      <version>1.23.0</version>
    </dependency>
  </dependencies>
</project>
Igor Dvorzhak
  • 4,360
  • 3
  • 17
  • 31
edocx
  • 87
  • 1
  • 6

2 Answers2

1

I figured the main issue for me was the big query connector config in the cluster. I added the jar to the classpath and it fixed the issue. Below instructions as per Google's documentation.

https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcs/INSTALL.md#add-the-connector-jar-to-hadoops-classpath

Add the connector jar to Hadoop's classpath Placing the connector jar in the appropriate subdirectory of the Hadoop installation may be effective to have Hadoop load the jar. However, to be certain that the jar is loaded, add HADOOP_CLASSPATH=$HADOOP_CLASSPATH:</path/to/gcs-connector-jar> to hadoop-env.sh in the Hadoop configuration directory.

Igor Dvorzhak
  • 4,360
  • 3
  • 17
  • 31
edocx
  • 87
  • 1
  • 6
0

This is because you are using com.google.cloud.bigdataoss:bigquery-connector:0.10.2-hadoop2 BigQuery connector version which is incompatible with com.google.cloud:google-cloud-bigquery:1.23.0 library version.

You need to upgrade com.google.cloud.bigdataoss:bigquery-connector to at least 0.11.0 version, and make it consistent with versions of other com.google.cloud.bigdataoss dependencies (in your case it will be 0.12.0 version), i.e. they all should be from the same release that are listed here: https://github.com/GoogleCloudPlatform/bigdata-interop/releases

Igor Dvorzhak
  • 4,360
  • 3
  • 17
  • 31