Hive Create table over S3 in RIAK CS

Question

I have Hive service running on a Hadoop cluster. I'm trying to create a Hive table over Eucalyptus(RIAK CS) S3 data. I have configured the AccessKeyID and SecretAccessKey in core-site.xml and hive-site.xml. When I execute the Create table command and specify the S3 location using s3n schema, I get the below error:

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:org.apache.http.conn.ConnectTimeoutException: Connect to my-bucket.s3.amazonaws.com:443 timed out)

If I try using the s3a schema, I get the below error:

FAILED: AmazonClientException Unable to load AWS credentials from any providern the chain

I could change the endpoint URL for distcp command using jets3t, but the same didnt work for Hive. Any suggestions to point Hive to Eucalyptus S3 endpoint are welcome.

Is it possible to connect Riak CS by simpler command line tools, e.g. s3cmd or s3curl ? — shino, Jan 26 '16 at 02:05
Some more questions. - Did you use https in s3cmd too? - Can you try to connect plain http instead of https in connecting riak cs? - Do you use proxy to connect to riak cs? - Can you confirm that the client actually try to connect to *your* riak cs server? - Are there any lines in riak cs log which indicates errors? — shino, Jan 27 '16 at 08:58
I have configured the S3 access for my account using the s3cfg file which has the end-point URL. I have not configured http or https protocol for the connectivity. The Hive client is not trying to connect to RIAK CS. By default client points to "s3.amazonaws.com" and I'm unable to modify it to the required end-point. — Veronica, Jan 27 '16 at 13:13
Do you want to connect to AWS S3 or (your own?) riak cs? If your Hive client does not try to connect ro Riak CS, this is not Riak CS related issue. — shino, Jan 28 '16 at 01:07
I'm trying to connect to my own riak cs. Yes, I understand its Hive client issue. I need help in configuring the client to point to my own S3 endpoint. — Veronica, Jan 28 '16 at 05:09

score 0 · Answer 1 · edited Jun 20 '20 at 09:12

0

I'm not familiar with Hive, but as long as I hear it uses MapReduce as backend processing system. MapReduce uses jets3t as S3 connector - changing its configuration worked for me in both MapReduce and Spark. Hope this helps: http://qiita.com/kuenishi/items/71b3cda9bbd1a0bc4f9e

Configurations like

s3service.https-only=false

s3service.s3-endpoint=yourdomain.com

s3service.s3-endpoint-http-port=8080

s3service.s3-endpoint-https-port=8080

would work for you?

edited Jun 20 '20 at 09:12

Community

1
1

answered Jan 29 '16 at 01:00

kuenishi

21
2

I had tried setting jets3t.properties with the configurations mentioned,but the Hive client still points to s3.amazonaws.com – Veronica Jan 29 '16 at 13:13

score 0 · Answer 2 · answered Mar 24 '16 at 05:51

0

I have upgraded to HDP2.3(Hadoop 2.7) and now I'm able to configure s3a schema for Hive to S3 access.

answered Mar 24 '16 at 05:51

Veronica

1
4

Hive Create table over S3 in RIAK CS

2 Answers2