4

Context: Our business users receive excel sheets (.xlsx) via mail that they want to import into Foundry. We agreed on a given structure and naming convention for the files and tabs in order to simply drag and drop them into a specific folder and append them to the existing dataset. The change of this existing dataset then triggers a pipeline (raw->clean->ontology).

Issue: We use "Additional Columns" to clean up the data and apply some logic based on them (_filePath, _byteOffset, _importedAt) but every time a new excel is appended the schema seems to be reset and the "Additional Columns" are unticked.

Additional Columns unticked in "Edit schema"

Is there a way of keeping the "Additional Columns" after importing and appending an excel sheet to an existing dataset?

Patrick
  • 61
  • 3

1 Answers1

0

Unfortunately, imports through the drag-and-drop interface always replace the existing schema on import which is why you are losing the additional columns. If you can create the files as CSV's instead of XLS then you can append and keep the existing schema, including the additional columns. Another approach, albeit indirect, would be to have an additional step between raw and clean that calls the metadata API to add the optional columns.

You'd want to set these textParserParam arguments:

textParserParams["addFilePath"] = True
textParserParams["addByteOffset"] = True
textParserParams["addImportedAt"] = True
Kellen Donohue
  • 777
  • 8
  • 17