6

I'm trying to execute a Google Dataflow Application, but it is throw this Exception

java.lang.IllegalArgumentException: No filesystem found for scheme gs
    at org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:459)
    at org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:529)
    at org.apache.beam.sdk.io.FileBasedSink.convertToFileResourceIfPossible(FileBasedSink.java:213)
    at org.apache.beam.sdk.io.TextIO$TypedWrite.to(TextIO.java:700)
    at org.apache.beam.sdk.io.TextIO$Write.to(TextIO.java:1028)
    at br.com.sulamerica.mecsas.ExportacaoDadosPipeline.main(ExportacaoDadosPipeline.java:52)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
    at java.lang.Thread.run(Thread.java:748)

This is a slice of my Pipeline code

Pipeline.create()
        .apply(PubsubIO.readStrings().fromSubscription(subscription))
        .apply(new KeyExportacaoDadosToEntityTransform())
        .apply(new ListKeyEmpresaSelecionadasTransform())
        .apply(ParDo.of(new DoFn<List<Entity>, String>() {
            @ProcessElement
            public void processElement(ProcessContext c){
                c.output(
                    c.element().stream()
                        .map(e-> e.getString("dscRazaoSocial"))
                        .collect(Collectors.joining("\r\n"))
                );
            }
        }))
        .apply(TextIO.write().to("gs://<my bucket>"))
        .getPipeline()
    .run();

And this is the command used to execute my pipeline

mvn -Pdataflow-runner compile exec:java \
  -Dexec.mainClass=br.com.xpto.foo.ExportacaoDadosPipeline \
  -Dexec.args="--project=<projectID>\
  --stagingLocation=gs://dataflow-xpto/exportacao/staging \
  --output=gs://dataflow-xpto/exportacao/output \
  --runner=DataflowRunner"  
Maxim
  • 4,075
  • 1
  • 14
  • 23
  • Which SDK version are you using? I've just tried writing to GCS by using the [WordCount code you get in the Quickstart](https://cloud.google.com/dataflow/docs/quickstarts/quickstart-java-maven#get-the-wordcount-code) and I could write files to GCS without issues. – F10 Dec 13 '18 at 14:33
  • You may be missing a dependency on a GCS file system. Maybe look for packages in Beam that may support GCS filesystems? – Pablo Jan 02 '19 at 18:40

2 Answers2

6

I was grappling the same issue. So if you are using Maven to build the executable jar your shade plugin should look like this;

                        <transformers>
                            <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                            <!-- add Main-Class to manifest file -->
                            <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                <mainClass>com.main.Application</mainClass>
                            </transformer>
                        </transformers>
                    </configuration>
Rana
  • 76
  • 1
  • 2
1

I recently ran into this issue while working on Apache beam Java pipeline using Gradle.

Apply gradle shade plugin 'com.github.johnrengelman.shadow' to resolve this issue.

Pasting my build.gradle file here for future reference -

buildscript {
    repositories {
        maven {
           url "https://plugins.gradle.org/m2/"
        }
        jcenter()
    }
    dependencies {
        classpath 'com.github.jengelman.gradle.plugins:shadow:5.1.0'
    }
}


plugins {
    id 'java'
    id 'com.github.johnrengelman.shadow' version '5.1.0'
}


sourceCompatibility = 1.8


apply plugin: 'java'
apply plugin: 'com.github.johnrengelman.shadow'

repositories {
    mavenLocal()
    mavenCentral()
    jcenter()
    ivy {
        url 'http://dl.bintray.com/content/johnrengelman/gradle-plugins'
    }
}

dependencies {
// your dependencies here
}

jar {
    manifest {
        attributes "Main-Class": "your_main_class_wth_package"
    }

    from {
        configurations.compile.collect { it.isDirectory() ? it : zipTree(it) }
    }
}

You should see task shadowJar under shadow option in IntelliJ build. Enjoy!

Onkar
  • 297
  • 5
  • 9