2

I have two Foundry datasets that contain raw files (lets say xml or csv files). I would like to merge these two within a transform to create a new dataset with a collection from both.

(This explicit example was due to a API schema being updated, and required to merge the existing data with the new version).

ex A: csv1, csv2, csv3, csv4, csv5 (source) B: csv1, csv2, csv3 (target)

Patrick OC
  • 41
  • 2

1 Answers1

2

Because Foundry datasets store raw files, a simple Python transform using shutil.copyfileobj should do the trick. This is further documented under Palantir docs: transforms/python-raw-file-access#writing-files

for file_status in in_source.filesystem().ls(glob='*.csv'):
    with in_source.filesystem().open(file_status.path, 'rb') as in_f:
        with out.filesystem().open(file_status.path, 'wb') as out_f:
            shutil.copyfileobj(in_f, out_f)
fmsf
  • 36,317
  • 49
  • 147
  • 195
Patrick OC
  • 41
  • 2
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jan 09 '22 at 22:08