How to query on AWS Athena partitioned table 2

Question

Summary of the Question

I also checked the below link to repair my issue, but it doesn't helped me.

How to query on AWS Athena partitioned table

There are 4 types of partitions, in my log table.

aws_account_id string
log_type string
aws_region string
date string

I tried a simple SELECT query on partitioned table. But got the below error message.

ERROR Message

GENERIC_INTERNAL_ERROR: No value present 
This query ran against the "default" database, unless qualified by the query.

SELECT Query that I Tried

SELECT * FROM "logs" LIMIT 10;

AND

SELECT * FROM "default"."logs" LIMIT 10;

Since error message about No value present, I checked the partitions results.

SHOW PARTITIONS logs;

Result format

aws_account_id=${aws_account_id}/log_type=${log_type}/aws_region=${aws_region}/date=${date}

I would greatly appreciate your help.

Further Info

CREATE TABLE command that I used

Create Table

CREATE EXTERNAL TABLE `logs`(
  `account_id` string, 
  `interface_id` string, 
  `srcaddr` string, 
  `dstaddr` string, 
  `srcport` int, 
  `dstport` int, 
  `traffic_protocol` bigint, 
  `packets` bigint, 
  `bytes` bigint, 
  `start_time` bigint, 
  `end_time` bigint, 
  `traffic_action` string, 
  `log_status` string, 
  `vpc_id` string, 
  `subnet_id` string, 
  `instance_id` string, 
  `tcp_flags` int, 
  `flow_direction` string)
PARTITIONED BY ( 
  `aws_account_id` string, 
  `log_type` string, 
  `aws_region` string, 
  `date` string)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ' ' 
WITH SERDEPROPERTIES ( 
  'EXTERNAL'='true', 
  'skip.header.line.count'='1') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://default/path/'
TBLPROPERTIES (
  'last_modified_by'='hadoop', 
  'last_modified_time'='1646821386', 
  'projection.aws_account_id.type'='injected', 
  'projection.aws_region.type'='injected', 
  'projection.date.format'='yyyy/MM/dd', 
  'projection.date.interval'='1', 
  'projection.date.interval.unit'='DAYS', 
  'projection.date.range'='2019/11/27, NOW-1DAYS', 
  'projection.date.type'='date', 
  'projection.enabled'='true', 
  'projection.log_type.type'='string', 
  'projection.log_type.values'='logs', 
  'storage.location.template'='s3://default/logs/${aws_account_id}/${log_type}/${aws_region}/${date}', 
  'transient_lastDdlTime'='1646821386')

How did you insert data into the table? If the partition directories were already present, then run [MSCK REPAIR TABLE - Amazon Athena](https://docs.aws.amazon.com/athena/latest/ug/msck-repair-table.html) to have Amazon Athena scan the existing partitions. (This is not necessary if Athena inserted the data into those partitions.) Can you provide an example Key of one of the files stored in that path (with the Full Path included)? I'm thinking that the 'result format' you have shown is not compatible with the 'storage.location.template' that you have shown (not that I know how it is used). — John Rotenstein, Mar 10 '22 at 09:02
**Side-comment:** Your choice of using 4 levels of partitioning seems quite strange. It would likely introduce a lot of overhead and would only be beneficial if there are huge numbers of rows, since each partition would only contain a small portion of data. How many rows are in this table? Have you tried using it _without_ partitioning and found it to be performing poorly? — John Rotenstein, Mar 10 '22 at 09:06
Sorry I checked now the partitions and I have seen that some or all partitions are not stored in metastore. I use aws glue to load the data into the Athena table. But what can be the reason that the Athena don't load the partitions? — Tazz57, Mar 10 '22 at 09:21
Yeah, I know what you mean, but I need so many partitions. I want to store vpc flow logs from over 120 aws accounts to analyze network issues. And the path how amazon s3 store flow logs looks /database/table/accounts/log_type/region/year/month/day. I need 18 columns in the log format. I have a huge number of data which should be stored in s3 for 14 days. After that a lifecycle configuration will delete log files in s3. — Tazz57, Mar 10 '22 at 09:38
Yes I tried to run MSCK REPAIR TABLE and nothing has changed. — Tazz57, Mar 10 '22 at 09:39
Did you mean the storage location of a file ? s3://default/logs/${aws_account_id}/${log_type}/${aws_region}/${date}/ — Tazz57, Mar 10 '22 at 09:46
Normally, partitions are stored as `aws_account_id=1234/log_type=standard/aws_region=us-east-1/date=2022-03-10`. However, I know that it is possible to specify a `location` when using `ALTER TABLE ADD PARTITION`. I wonder whether AWS Glue does this for you, so it knows the location of each partition? `MSCK REPAIR TABLE` would only recognize the first format (with equal signs), not the second one (with locations). I'm not familiar with how Glue can be used to _load_ a table -- I normally use it to crawl an existing data location. — John Rotenstein, Mar 10 '22 at 09:51
I set the table properties in the Athena editor by running a query. You can define a storage descriptor for glue tables to identify the location for the table input. — Tazz57, Mar 10 '22 at 10:06
@JohnRotenstein I tried now the first format and nothing has changed the issue is existing. — Tazz57, Mar 10 '22 at 10:33
I'm not sure what you mean by "the first format", but if you mean the format with the equals signs, then try `MSCK REPAIR TABLE` -- it should recognize the existing partitions. — John Rotenstein, Mar 10 '22 at 10:49

How to query on AWS Athena partitioned table 2

0 Answers0