In databricks notebook, I am reading json files with readStream
, json has structure for example:
id | entityType | eventId |
---|---|---|
1 | person | 123 |
2 | employee | 234 |
3 | client | 687 |
4 | client | 687 |
My code:
cloudfile = {
"cloudFiles.format": "json",
"cloudFiles.schemaLocation": SCHEMA_LOCATION
"cloudFiles.useNotifications", True}
df = (spark.readStream
.format('cloudfiles')
.options(**cloudfile)
.load(SOURCE_PATH)
)
How can I write it using writeStream
to different folders, depending on column values?
Output exmaple:
mainPath/{entityType}/{eventId}/data.json
- entity with id = 1 to file: mainPath/person/123/data.json
- entity with id = 2 to file: mainPath/employee/234/data.json
- entity with id = 3 to file: mainPath/client/687/data.json
- ...