Run out of ideas on how to solve the following issue. A table in the Glue data catalog has this schema:
root
|-- _id: string
|-- _field: struct
| |-- ref: choice
| | |-- array
| | | |-- element: struct
| | | | |-- value: null
| | | | |-- key: string
| | | | |-- name: string
| | |-- struct
| | | |-- value: null
| | | |-- key: choice
| | | | |-- int
| | | | |-- string
| | | |-- name: string
If I try to resolve the ref
choice using
resolved = (
df.
resolveChoice(
specs = [('_field.ref','cast:array')]
)
)
I lose records.
Any ideas on how I could:
- filter the DataFrame on whether
_field.ref
is anarray
orstruct
- convert
struct
records into anarray
or vice-versa