0

To read orc file from a GCS bucket i'm using below code snippet, where i'm creating hadoop configuration and setting required file system attributes to use gcs bucket

      val hadoopConf = new Configuration()
      hadoopConf.set("fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")
      hadoopConf.set("fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS")
      hadoopConf.set("fs.defaultFS", "gs://BUCKET_NAME")
      hadoopConf.set("fs.gs.auth.service.account.enable", "true")
      hadoopConf.set("fs.gs.auth.service.account.json.keyfile", System.getenv("GOOGLE_APPLICATION_CREDENTIALS"))
      val filePath = "path/to/file.orc"
      val reader = OrcFile.createReader(new Path(filePath), OrcFile.readerOptions(hadoopConf))

gs://BUCKET_NAME/path/to/file.orc is present.

But when running the same, its getting stuck and the last log is

WARN  [.h.u.NativeCodeLoader] - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

What am I missing?

Dependencies

    "org.apache.hadoop"       % "hadoop-common"           % "3.2.1",
    "org.apache.hadoop"       % "hadoop-hdfs"             % "3.2.1",
    "org.apache.hadoop"       % "hadoop-hdfs-client"      % "3.2.1",
    "com.google.cloud.bigdataoss" % "gcs-connector" % "hadoop2-2.2.0",
mazaneicha
  • 8,794
  • 4
  • 33
  • 52
  • 3
    `WARN [.h.u.NativeCodeLoader] - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable` this isn't a sign of an error, there must be a more helpful log somewhere? – Ben Watson Jun 09 '23 at 07:02
  • 1
    Make sure you add log4j dependencies and add your own properties file with debug level to see internal hadoop client logs – OneCricketeer Jun 10 '23 at 07:06

0 Answers0