To read orc file from a GCS bucket i'm using below code snippet, where i'm creating hadoop configuration and setting required file system attributes to use gcs bucket
val hadoopConf = new Configuration()
hadoopConf.set("fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")
hadoopConf.set("fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS")
hadoopConf.set("fs.defaultFS", "gs://BUCKET_NAME")
hadoopConf.set("fs.gs.auth.service.account.enable", "true")
hadoopConf.set("fs.gs.auth.service.account.json.keyfile", System.getenv("GOOGLE_APPLICATION_CREDENTIALS"))
val filePath = "path/to/file.orc"
val reader = OrcFile.createReader(new Path(filePath), OrcFile.readerOptions(hadoopConf))
gs://BUCKET_NAME/path/to/file.orc
is present.
But when running the same, its getting stuck and the last log is
WARN [.h.u.NativeCodeLoader] - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
What am I missing?
Dependencies
"org.apache.hadoop" % "hadoop-common" % "3.2.1",
"org.apache.hadoop" % "hadoop-hdfs" % "3.2.1",
"org.apache.hadoop" % "hadoop-hdfs-client" % "3.2.1",
"com.google.cloud.bigdataoss" % "gcs-connector" % "hadoop2-2.2.0",