2

I am getting the below error while accessing a Google Cloud Storage bucket for the first time via Cloudera CDH 6.3.3 Hadoop Cluster. I am running the command on the edge node where Google Cloud SDK is installed. Reachability of Google Storage is only possible via HTTP proxy as of now.

Cloudera CDH 6.3.3 cluster is on-prem.

Below is the command that I run

hadoop --loglevel trace fs -ls gs://distcppoc-2021-08-09/

Error is:

ls: Error accessing: bucket: distcppoc-2021-08-09

Last few lines when the Hadoop command is run:

21/08/10 21:07:42 DEBUG security.UserGroupInformation: hadoop login commit
21/08/10 21:07:42 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: <username>
21/08/10 21:07:42 DEBUG security.UserGroupInformation: Using user: "UnixPrincipal: <username>" with name <username>
21/08/10 21:07:42 DEBUG security.UserGroupInformation: User entry: "<username>"
21/08/10 21:07:42 DEBUG security.UserGroupInformation: UGI loginUser:<username> (auth:SIMPLE)
21/08/10 21:07:42 DEBUG core.Tracer: sampler.classes = ; loaded no samplers
21/08/10 21:07:42 TRACE core.TracerId: ProcessID(fmt=%{tname}/%{ip}): computed process ID of "FSClient/<ip>"
21/08/10 21:07:42 TRACE core.TracerPool: TracerPool(Global): adding tracer Tracer(FSClient/<IP>)
21/08/10 21:07:42 DEBUG core.Tracer: span.receiver.classes = ; loaded no span receivers
21/08/10 21:07:42 TRACE core.Tracer: Created Tracer(FSClient/<ip>) for FSClient
21/08/10 21:07:42 DEBUG fs.FileSystem: Loading filesystems
21/08/10 21:07:42 DEBUG fs.FileSystem: file:// = class org.apache.hadoop.fs.LocalFileSystem from /opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4762.13062148/jars/hadoop-common-3.0.0-cdh6.3.3.jar
21/08/10 21:07:42 DEBUG fs.FileSystem: viewfs:// = class org.apache.hadoop.fs.viewfs.ViewFileSystem from /opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4762.13062148/jars/hadoop-common-3.0.0-cdh6.3.3.jar
21/08/10 21:07:42 DEBUG fs.FileSystem: ftp:// = class org.apache.hadoop.fs.ftp.FTPFileSystem from /opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4762.13062148/jars/hadoop-common-3.0.0-cdh6.3.3.jar
21/08/10 21:07:42 DEBUG fs.FileSystem: har:// = class org.apache.hadoop.fs.HarFileSystem from /opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4762.13062148/jars/hadoop-common-3.0.0-cdh6.3.3.jar
21/08/10 21:07:42 DEBUG fs.FileSystem: http:// = class org.apache.hadoop.fs.http.HttpFileSystem from /opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4762.13062148/jars/hadoop-common-3.0.0-cdh6.3.3.jar
21/08/10 21:07:42 DEBUG fs.FileSystem: https:// = class org.apache.hadoop.fs.http.HttpsFileSystem from /opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4762.13062148/jars/hadoop-common-3.0.0-cdh6.3.3.jar
21/08/10 21:07:42 DEBUG fs.FileSystem: s3n:// = class org.apache.hadoop.fs.s3native.NativeS3FileSystem from /opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4762.13062148/jars/hadoop-aws-3.0.0-cdh6.3.3.jar
21/08/10 21:07:42 DEBUG fs.FileSystem: gs:// = class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem from /opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4762.13062148/jars/gcs-connector-hadoop3-1.9.10-cdh6
21/08/10 21:07:42 DEBUG fs.FileSystem: hdfs:// = class org.apache.hadoop.hdfs.DistributedFileSystem from /opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4762.13062148/jars/hadoop-hdfs-client-3.0.0-cdh6.3.3.jar
21/08/10 21:07:42 DEBUG fs.FileSystem: webhdfs:// = class org.apache.hadoop.hdfs.web.WebHdfsFileSystem from /opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4762.13062148/jars/hadoop-hdfs-client-3.0.0-cdh6.3.3.jar
21/08/10 21:07:42 DEBUG fs.FileSystem: swebhdfs:// = class org.apache.hadoop.hdfs.web.SWebHdfsFileSystem from /opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4762.13062148/jars/hadoop-hdfs-client-3.0.0-cdh6.3.3.j
21/08/10 21:07:42 DEBUG fs.FileSystem: Looking for FS supporting gs
21/08/10 21:07:42 DEBUG fs.FileSystem: looking for configuration option fs.gs.impl
21/08/10 21:07:42 DEBUG fs.FileSystem: Filesystem gs defined in configuration option
21/08/10 21:07:42 DEBUG fs.FileSystem: FS for gs is class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
ls: Error accessing: bucket: distcppoc-2021-08-09
21/08/10 21:07:43 TRACE core.TracerPool: TracerPool(Global): removing tracer Tracer(FsShell/<ip>)
21/08/10 21:07:43 DEBUG util.ShutdownHookManager: Completed shutdown in 0.004 seconds; Timeouts: 0
21/08/10 21:07:43 DEBUG util.ShutdownHookManager: ShutdownHookManger completed shutdown.

Below are the configurations that are added to Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml in Cloudera Manager --> HDFS --> Configurations

<property>
    <name>fs.gs.working.dir</name>
    <value>/</value>
</property>
<property>
    <name>fs.gs.path.encoding</name>
    <value>uri-path</value>
</property>
<property>
    <name>fs.gs.auth.service.account.email</name>
    <value>serviceaccount@dummyemail.iam.gserviceaccount.com</value>
</property>
<property>
    <name>fs.gs.auth.service.account.private.key.id</name>
<value>52d6ad0c6ecb7f6da9</value>
</property>
<property>
    <name>fs.gs.auth.service.account.private.key</name>
    <value>MIIEvgIBADANBgkq<FULL PRIVATE KEY>MMASBjSOTA1j+jL</value>
</property>

Restarted HDFS Services.

gsutil command works fine when it is run from an on-prem cluster.

Command: gsutil ls gs://distcppoc-2021-08-09                                                                                                                     
Output: gs://distcppoc-2021-08-09/sftp.png

GCS Connector is installed on all the Cloudera Cluster Hadoop nodes at below location:

Location: /opt/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4762.13062148/jars
Jar file: gcs-connector-hadoop3-1.9.10-cdh6.3.3-shaded.jar

Can I get some help here?

P.S: This is the first time I am putting a question, so please correct me if I am putting the question in the wrong way.

bobby
  • 21
  • 1

0 Answers0