0

I've just landed my first pipeline in Java and the following errors pops up.

Exception in thread "main" java.lang.IllegalArgumentException: No filesystem found for scheme gs

Having the following code.

        pipeline.apply("ReadLines", TextIO.read().from(options.getInputFile()))
            .apply(MapElements.via(new SampleFn()))
            .apply("WriteLines", TextIO
                .write()
                .to(options.getOutputDir())
                .withSuffix(".txt"));

Started a scratch project from the examples found in https://github.com/apache/beam/tree/master/examples/java, but seems that I may be missing some dependencies with Maven.

The following .pom extract are the dependencies related to Beam and GCP. Which am I missing?

    <dependency>
      <groupId>org.apache.beam</groupId>
      <artifactId>beam-sdks-java-core</artifactId>
      <version>2.19.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.beam</groupId>
      <artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
      <version>2.19.0</version>
      <exclusions>
        <exclusion>
          <groupId>junit</groupId>
          <artifactId>junit</artifactId>
        </exclusion>
        <exclusion>
          <groupId>com.google.cloud.bigtable</groupId>
          <artifactId>bigtable-client-core</artifactId>
        </exclusion>
      </exclusions>
    </dependency>
    <dependency>
      <groupId>com.google.guava</groupId>
      <artifactId>guava</artifactId>
      <version>${guava.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.beam</groupId>
      <artifactId>beam-vendor-guava-20_0</artifactId>
      <version>${beam-vendor-guava.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.beam</groupId>
      <artifactId>beam-sdks-java-extensions-google-cloud-platform-core</artifactId>
      <version>2.19.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.beam</groupId>
      <artifactId>beam-sdks-java-extensions-protobuf</artifactId>
      <version>2.19.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.beam</groupId>
      <artifactId>beam-runners-google-cloud-dataflow-java</artifactId>
      <version>2.19.0</version>
    </dependency>

EDIT: Shadowing is already being performed.

      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>${maven-shade-plugin.version}</version>
        <executions>
          <execution>
            <id>sample-pipeline-build</id>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
            <configuration>
              <finalName>sample-pipeline-bundled</finalName>
              <filters>
                <filter>
                  <artifact>*:*</artifact>
                  <excludes>
                    <exclude>META-INF/LICENSE</exclude>
                    <exclude>META-INF/*.SF</exclude>
                    <exclude>META-INF/*.DSA</exclude>
                    <exclude>META-INF/*.RSA</exclude>
                  </excludes>
                </filter>
              </filters>
              <transformers>
                <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                  <mainClass>my.project.SamplePipeline</mainClass>
                </transformer>
              </transformers>
            </configuration>
          </execution>
        </executions>
      </plugin>

EDIT 2: Contents of META-INF/services/org.apache.beam.sdk.io.FileSystemRegistrar in bundled jar.

org.apache.beam.sdk.io.LocalFileSystemRegistrar
org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystemRegistrar
czr_RR
  • 541
  • 5
  • 16

1 Answers1

0

This is usually happens when building a single jar with all of your dependencies without shading things properly. See "java.lang.IllegalArgumentException: No filesystem found for scheme gs" when running dataflow in google cloud platform and Google Dataflow "No filesystem found for scheme gs" for how to configure your pom file correctly.

danielm
  • 3,000
  • 10
  • 15
  • Thanks for replying. Weirdly enough I'm already shadowing dependencies in the fashion that the first question suggests. I will edit my question to show you how I've done it. Shouldn't it just work straight away in local? Like, running it from the IDE? Not trying to execute the jar at the moment, only debugging in local. – czr_RR Feb 08 '21 at 19:22
  • Can you share the contents of `META-INF/services/org.apache.beam.sdk.io.FileSystemRegistrar` in your bundled jar? – Kenn Knowles Feb 09 '21 at 03:49
  • Edited with the contents @KennKnowles – czr_RR Feb 09 '21 at 08:20
  • Yet another update (@KennKnowles and @danielm). It only happens when trying to write. I'm able to read the file contents. May be issues with the roles in the project itself and it is just complaining about the gs FS? – czr_RR Feb 09 '21 at 09:11
  • Very strange. Your service loader config looks good. And if you can read files that seems like it is working. I wonder if there is another error deeper in the stack trace? – Kenn Knowles Feb 09 '21 at 18:19