Experimenting with AWS Athena. Am attempting to create a table from an S3 bucket which has files structures like so:
my-bucket/
my-bucket/group1/
my-bucket/group1/entry1/
my-bucket/group1/entry1/data.bin
my-bucket/group1/entry1/metadata
my-bucket/group1/entry2/
my-bucket/group1/entry2/data.bin
my-bucket/group1/entry2/metadata
...
my-bucket-group2/
...
Only the metadata
files are JSON files. Each one looks like this:
{
"key1": "value1",
"key2": "value2",
"key3": n
}
So I tried to create a table:
CREATE EXTERNAL TABLE example (
key1 string,
key2 string,
key3 int
)
ROW FORMAT serde 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://my-bucket/'
The create query succeeded, but when I attempt to query:
SELECT * FROM preserved_recordings limit 10;
I get an error:
Query 93aa62d6-8a52-4a5d-a2fb-08a6e00181d3 failed with error code HIVE_CURSOR_ERROR: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: expected close marker for OBJECT (from [Source: java.io.ByteArrayInputStream@2da7f4ef; line: 1, column: 0]) at [Source: java.io.ByteArrayInputStream@2da7f4ef; line: 1, column: 3]
Does AWS Athena require all files in the bucket to be JSON in this case? I'm not sure if the .bin files are causing the cursor error, or if something else is going on. Has anyone else encountered this, or can clue me in at to what is going on?