In Athena how do I query a member of a struct in an array in a struct?

Question

I am trying to figure out how to query where I am checking the value of usage given the following table creation:

CREATE EXTERNAL TABLE IF NOT EXISTS foo.test (
     `id` string,
     `foo` struct< usages:array< struct< usage:string,
     method_id:int,
     start_at:string,
     end_at:string,
     location:array<string> >>> 
) PARTITIONED BY (
         timestamp date 
) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
         'serialization.format' = '1' ) LOCATION 's3://foo.bar/' TBLPROPERTIES ('has_encrypted_data'='false');

I would like to have a query like:

SELECT * FROM "foo"."test" WHERE foo.usages.usage is null;

When I do that I get:

SYNTAX_ERROR: line 1:53: Expression "foo"."usages" is not of type ROW

If I make my query where I directly index the array as seen in the following it works.

SELECT * FROM "foo"."test" WHERE foo.usages[1].usage is null;

My overall goal though is to query across all items in the usages array and find any row where at least one item in the usages array has a member usage that is null.

Piotr Findeisen · Answer 1 · 2019-11-06T20:18:59.567

8

Athena is based on Presto. In Presto 318 you can use any_match:

SELECT * FROM "foo"."test"
WHERE any_match(foo.usages, element -> element.usage IS NULL);

I think the function is not available in Athena yet, but you can emulate it using reduce.

SELECT * FROM "foo"."test"
WHERE reduce(
  foo.usages, -- array to reducing
  false, -- initial state
  (state, element) -> state OR element.usage IS NULL, -- combining function
  state -> state); -- output function (identity in this case)

edited Nov 06 '19 at 20:18

answered Sep 12 '19 at 20:35

Piotr Findeisen

19,480
2
52
82

4

it is true that `any_match` is not available in Athena – Anentropic Nov 05 '19 at 12:56
1

this reduce needed a small change in order to work for me... I had to replace `||` with `OR` keyword (else it treated that as array concatenation operator) – Anentropic Nov 05 '19 at 13:57
1

@Anentropic right!.. Too much Java everyday. Updated, thanks. – Piotr Findeisen Nov 06 '19 at 20:19
3

`any_match` is still not available yet. For a list of all available Presto functions in Athena: https://docs.aws.amazon.com/athena/latest/ug/presto-functions.html – blue_chip Apr 02 '20 at 20:15

score 2 · Accepted Answer · answered Sep 12 '19 at 18:58

2

You can achieve this by unnesting the array into rows and then check those for null values. This will result in one row per null-value entry.

select * from test
CROSS JOIN UNNEST(foo.usages) AS t(i)
where i.usage is null

So if you only nee the unique set, you must run this through a select distinct.

select distinct id from test
CROSS JOIN UNNEST(foo.usages) AS t(i)
where i.usage is null

answered Sep 12 '19 at 18:58

jens walter

13,269
2
56
54

1

My intuition suggests UNNEST+DISTINCT is not going to be the most performant option. Even once Athena gets the UNNEST optimization (https://github.com/prestosql/presto/pull/901), DISTINCT may still be costly for a bigger data set. – Piotr Findeisen Sep 13 '19 at 22:10

score 2 · Answer 3 · answered Mar 14 '22 at 09:53

Another way to emulate any_match(<array>, <function>) is with cardinality(filter(<array>, <function>)) > 0.

SELECT * FROM "foo"."test"
WHERE any_match(foo.usages, element -> element.usage IS NULL);

Becomes:

SELECT * FROM "foo"."test"
WHERE cardinality(filter(foo.usages, element -> element.usage IS NULL)) > 0

In Athena how do I query a member of a struct in an array in a struct?

3 Answers3