0

we are trying to load data, which is saved as sequence files, into BQ using Google DataFLow SDK.

At The entry point , we are trying to read the data into the pipeline using the following code

    Read.Bounded<KV<LongWritable, BytesWritable>> resuls = HadoopFileSource.readFrom("gs://raw-data/topic-name/dt=2017-02-28/1_0_00000000002956516884.gz",
            org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.class, LongWritable.class, BytesWritable.class);

[1] we are using "gcs-connector" to enable hadoop notion

[2] The HadoopFileSource is from com.google.cloud.dataflow.contrib.hadoop

our core-sites.xml file looks like that:

<configuration>
<property>
    <name>fs.gs.impl</name>
    <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem</value>
    <description>The FileSystem for gs: (GCS) uris.</description>
</property>
<property>
    <name>fs.AbstractFileSystem.gs.impl</name>
    <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
    <description>
        The AbstractFileSystem for gs: (GCS) uris. Only necessary for use with Hadoop 2.
    </description>
</property>

but we keep getting "java.net.UnknownHostException: metadata"

i event added GOOGLE_APPLICATION_CREDENTIALS="/path/to/key.json" to the environment variables, but still we are getting the same exception

just need easy way to read sequence files into Google DataFlow pipeline from GCS

will appreciate your help here

Thanks, Avi

Avi P
  • 53
  • 4
  • It's not really clear where you're getting the exception, or how this relates to application credentials. It's also not completely clear how you're using the `HadoopFileSource` with your pipelien -- could you include more of the source? – Ben Chambers Mar 02 '17 at 00:18
  • Thanks @BenChambers. – Avi P Mar 04 '17 at 06:07
  • i solved that error in my code and able to read from gs:// after adding to core-site.xml the relevant credentials. but what im getting now when running this code with --runner=BlockingDataflowPipelineRunner is java.lang.NoClassDefFoundError: com/google/api/client/util/BackOff (im attaching my source code and printscreen from google logging) – Avi P Mar 04 '17 at 06:07
  • The error looks like it is a dependency issue. Check that you are using the latest version of the SDK and matching versions of any dependencies, as listed in [the documentation](https://cloud.google.com/dataflow/pipelines/troubleshooting-your-pipeline#encoding-errors-ioexceptions-or-unexpected-behavior-in-user-code) – Ben Chambers Mar 08 '17 at 02:43

0 Answers0