I have a DataFrame like:
input_df = self.spark.createDataFrame(
data=[
("01", "file_name_1"),
("02", "file_name_2"),
("05", "file_name_5"),
],
schema=(
"RECORD_ID: string, FILE_NAME: string"
),
)
I have a folder /mnt/data/project/integration_test/
with the following files
file_name_1.json
file_name_2.json
file_name_3.json
file_name_4.json
I want to update those json files that are on the input_df
I thought the process would be:
- Delete json which name appears on
input_df
- Save each row
input_df
as individual json (I already solved this)
The final files on /mnt/data/project/integration_test/
would be:
file_name_1.json (updated)
file_name_2.json (updated)
file_name_3.json
file_name_4.json
file_name_5.json (created new)