I have a case in which the client sends us completely dynamic, unexpected data. We want to do some transformations to the data, but index the dynamic data as it is.
Each entry in that JSON can be of a different type, so MAP<VARCHAR, VARCHAR>
cannot be used.
The entries' content (field names) also is not known in advanced, so also STRUCT
or SELECT EXTRACTJSON
is not a viable solution.
For example, an entry may look like this:
{
"id": "2345",
"name": "sunrise",
"fields": {
"price": 123,
"currency": "usd",
"transaction_time": "2021-01-03T11:20:00.000Z",
"abc": "some string"
}
}
Later on, we might get an update to that entry, and it will look like this:
{
"id": "2345",
"name": "sunrise",
"fields": {
"price": 433,
"currency": "eur",
"transaction_time": "2021-01-04T12:34:00.000Z",
"abc": 244,
"extra_field": "pending"
}
}
Notice that abc
's type can be changed, and that we can't really SELECT
the fields as we don't know what they will be.
This is data that we get from a MongoDB source connector - it is NoSQL with no schema.
The expected output is with a transformation and fields
are written exactly as sent:
{
"id": "2345",
"nameTransformed": "Sunrise-Miami",
"fields": {
"price": 433,
"currency": "eur",
"transaction_time": "2021-01-04T12:34:00.000Z",
"abc": 244,
"extra_field": "pending"
}
}