1

I've been trying to create a Hive table backed by Avro files in S3. Initially, I thought this might be relatively simple to do, but I ran into the following error.

Here's the create table command:

set fs.s3.awsAccessKeyId=ACCESS_KEY_ID;
set fs.s3.awsSecretAccessKey=SECRET_ACCESS_KEY;
use some_database;
CREATE EXTERNAL TABLE experiment_with_s3_backed_data
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES (
        'avro.schema.literal'='{
        "namespace": "",
        "type": "record",
        "name": "SomeAvroSchema",
        "fields": [
            {"name": "someVariable","type":"string"}
        ]
}')
STORED AS INPUTFORMAT
    'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 's3://MY_BUCKET/some/data/'
;

And here's the error I get:

AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).

I tried using both s3 and s3n URLs and parameters, with the same result. I noticed related questions, which recommend adding the keys to core-site.xml, but I can't do this for two reasons:

  1. I can't change the Hadoop configuration, due to access restrictions.
  2. I might have different tables with different access privileges to S3, so I'm generally interested in providing users with the ability to load their S3 data into Hive tables.

See Problem with copying local data onto HDFS on a Hadoop cluster using Amazon EC2/ S3

Community
  • 1
  • 1
yoni
  • 5,686
  • 3
  • 27
  • 28

1 Answers1

0

I figured out a workaround for the S3 key settings by adding the keys directly to the S3 URL like so:

s3n://ACCESS_KEY:SECRET_KEY@MY_BUCKET/some/data/'

The resulting create table statement then looks like this:

use some_database;
CREATE EXTERNAL TABLE experiment_with_s3_backed_data
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES (
    'avro.schema.literal'='{
        "namespace": "",
        "type": "record",
        "name": "SomeAvroSchema",
        "fields": [
            {"name": "someVariable","type":"string"}
        ]
}')
STORED AS INPUTFORMAT
    'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 's3n://ACCESS_KEY:SECRET_KEY@MY_BUCKET/some/data/'
;
yoni
  • 5,686
  • 3
  • 27
  • 28