0

In databricks notebook, I am reading json files with readStream, json has structure for example:

id entityType eventId
1 person 123
2 employee 234
3 client 687
4 client 687

My code:

cloudfile = {
"cloudFiles.format": "json",
"cloudFiles.schemaLocation": SCHEMA_LOCATION
"cloudFiles.useNotifications", True}


df = (spark.readStream
  .format('cloudfiles')
  .options(**cloudfile)
  .load(SOURCE_PATH)
 )

How can I write it using writeStream to different folders, depending on column values?

Output exmaple:

mainPath/{entityType}/{eventId}/data.json

  • entity with id = 1 to file: mainPath/person/123/data.json
  • entity with id = 2 to file: mainPath/employee/234/data.json
  • entity with id = 3 to file: mainPath/client/687/data.json
  • ...

0 Answers0