I've successfully done this, but without a data classifier. My JSON file looks like:
[
{
"col1": "data_from_col1",
"col2": "data_from_col2",
"col3": [
{
"col4": "data_from_col4",
...
{
]
},
{
"col1": "data_from_col1",
"col2": "data_from_col2",
"col3": [
{
"col4": "data_from_col4",
...
{
]
},
...
]
I started with a crawler to get a basic table definition. IMPORTANT: the crawler's configuration options under Output CAN'T be set to Update the table definition..., or else re-running the crawler later will overwrite the manual changes described below. I used Add new columns only.
I had to add the 'strip.outer.array' property AND manually add the topmost columns within my anonymous array. The original schema from the initial crawler run was:
anon_array array<struct<col1:string,col2:string,col3:array<struct<col4...>>>
partition_0 string
I manually updated my schema to:
col1:string
col2:string
col3:array<struct<col4...>>
partition_0 string
(And also add the serde param strip.outer.array.)
Then I had to rerun my crawler, and finally I could query in Spectrum like:
select o.partition_0, o.col1, o.col2, t.col4
from db.tablename o
LEFT JOIN o.col3 t on true;