0

I have a cloudera VM and able to set up aws CLI and set up keys.But, I am not able to read s3 files or access s3 files using hadoop fs -ls s3://gft-ri or any hadoop command. I could see the directory/files using aws CLI.

Snapshot of the commands:

(base) [cloudera@quickstart conf]$ **aws s3 ls s3://gft-risk-aml-market-dev/**
                           PRE test/
2019-11-27 04:11:26        458 required

(base) [cloudera@quickstart conf]$ **hdfs dfs -ls s3://gft-risk-aml-market-dev/**
19/11/27 05:30:45 WARN fs.FileSystem: S3FileSystem is deprecated and will be removed in future releases. Use NativeS3FileSystem or S3AFileSystem instead.
ls: `s3://gft-risk-aml-market-dev/': No such file or directory

I have put the core-site.xml details.

  <property>
    <name>fs.s3.impl</name>
    <value>org.apache.hadoop.fs.s3.S3FileSystem</value>
  </property>

  <property>
    <name>fs.s3.awsAccessKeyId</name>
    <value>ANHS</value>
  </property>

  <property>
    <name>fs.s3.awsSecretAccessKey</name>
    <value>EOo</value>
  </property>

   <property>
     <name>fs.s3.path.style.access</name>
     <value>true</value>
    </property>

   <property>
    <name>fs.s3.endpoint</name>
    <value>s3.us-east-1.amazonaws.com</value>
  </property>

     <property>
        <name>fs.s3.connection.ssl.enabled</name>
        <value>false</value>
    </property>
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
user3858193
  • 1,320
  • 5
  • 18
  • 50
  • You should be using S3AFileSystem with `fs.s3a.impl` – OneCricketeer Nov 27 '19 at 21:43
  • https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html – OneCricketeer Nov 27 '19 at 21:48
  • Are you saying fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem fs.s3.impl org.apache.hadoop.fs.s3.S3FileSystem . But s3 is not deprecated so far. That shouldn't be the cause.rt – user3858193 Nov 28 '19 at 00:15
  • You should just remove all S3FileSystem usages or change the impl value to use S3AFileSystem. The other is deprecated, according to the message in your output – OneCricketeer Nov 28 '19 at 02:51
  • I did that. I am able to see the folders. But not able to access. error message:(base) [cloudera@quickstart conf]$ hdfs dfs -mkdir -p s3a://gft-risk-aml-market-dev/new 19/11/27 19:18:07 INFO http.AmazonHttpClient: Unable to execute HTTP request: The target server failed to respond com.cloudera.org.apache.http.NoHttpResponseException: The target server failed to respond at com.cloudera.org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:95) – user3858193 Nov 28 '19 at 03:29
  • Based on this, you need to upgrade some JAR files related to hadoop-aws and apache httpclient https://github.com/aws/aws-sdk-java/issues/1212 – OneCricketeer Nov 28 '19 at 05:42

2 Answers2

1

Finally. Cloudera Quickstart V13 and below core-site.xml worked.

  <property>
    <name>fs.s3a.impl</name>
    <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
  </property>

  <property>
    <name>fs.s3a.awsAccessKeyId</name>
    <value>AKIAxxxx</value>
  </property>

  <property>
    <name>fs.s3a.awsSecretAccessKey</name>
    <value>Xxxxxx</value>
  </property>

   <property>
     <name>fs.s3a.path.style.access</name>
     <value>true</value>
    </property>

<property>
  <name>fs.AbstractFileSystem.s3a.impl</name>
  <value>org.apache.hadoop.fs.s3a.S3A</value>
  <description>The implementation class of the S3A AbstractFileSystem.</description>
</property>

   <property>
    <name>fs.s3a.endpoint</name>
    <value>s3.us-east-1.amazonaws.com</value>
  </property>

     <property>
        <name>fs.s3a.connection.ssl.enabled</name>
        <value>false</value>
    </property>

<property>
  <name>fs.s3a.readahead.range</name>
  <value>64K</value>
  <description>Bytes to read ahead during a seek() before closing and
  re-opening the S3 HTTP connection. This option will be overridden if
  any call to setReadahead() is made to an open stream.</description>
</property>

<property>
  <name>fs.s3a.list.version</name>
  <value>2</value>
  <description>Select which version of the S3 SDK's List Objects API to use.
  Currently support 2 (default) and 1 (older API).</description>
</property>
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
user3858193
  • 1,320
  • 5
  • 18
  • 50
-1

I would use the Linux console to mount the S3 bucket and then move files from there to HDFS in that fashion. You will probably need to install it on the Cloudera quickstart by sudo'ing into root first, e.g., sudo yum install s3fs-fuse

ajdams
  • 2,276
  • 14
  • 20
  • Hadoop can read and write to S3 and expose it as a distributed file system from any client, not just a local mount on a single machine – OneCricketeer Nov 27 '19 at 21:43
  • So a mount? I don't agree with this down vote because he essentially is just reading from the S3 bucket on the machine - it isn't actually replacing HDFS – ajdams Nov 28 '19 at 20:07