1

I have a Dataprep flow configured. The Dataset is a GCS folder (all files from it). Target is BigQuery table.

Since data is coming from multiple files, I want to have filename as of the columns in the resulting data.

Is that possible?

Maxim
  • 4,075
  • 1
  • 14
  • 23

1 Answers1

2

UPDATE: There's now a source metadata reference called $filepath—which, as you would expect, stores the local path to the file in Cloud Storage (starting at the top-level bucket). You can use this in formulas or add it to a new formula column and then do anything you want in additional recipe steps. (If your data source sample was created before this feature, you'll need to generate a new sample in order to see it in the interface)

Full notes for these metadata fields are available here: https://cloud.google.com/dataprep/docs/html/Source-Metadata-References_136155148


Original Answer

This is not currently possible out of the box. IF you're manually merging datasets with UNION, you could first process them to add a column with the source so that it's then present in the combined output.

If you're bulk-ingesting files, that doesn't help—but there is an open feature request open that you can comment on and/or follow for updates: https://issuetracker.google.com/issues/74386476

justbeez
  • 1,367
  • 7
  • 12