I'm having some issues with reading items from Cosmos DB in databricks, it seems to read the JSON as a string value, and having some issues getting the data out of it to columns.
I have a column called ProductRanges with the following values in a row:
[ {
"name": "Red",
"min": 0,
"max": 99,
"value": "Order More"
},
{
"name": "Amber",
"min": 100,
"max": 499,
"value": "Stock OK"
},
{
"name": "Green",
"min": 500,
"max": 1000000,
"value": "Overstocked"
}
]
In Cosmos DB the JSON document is valid, out when importing the data the datatype in the dataframe is a string, not a JSON object/struct as I would expect.
I would like the be able to get count the number of times "name" comes up and iterate through them the get the min, max and value items, as the number of ranges that we can have can be more than 3. I've been though a few post on stackoverflow and other places but stuck on the formatting. I've tried to use the explode and read the schema based in the column values, but it does say 'in vaild document', think it may be due to Pyspark needing {} at the start and the end, but even concatenating that in the SQL query from cosmos db still ends up as the datatype of string.
Any pointers would be appreciated