If you're immediately going into code repos or code workbooks, then you can use the input_file_name() function (see proggeo's answer below). This is likely easier and simpler than the below, but won't work if you're going to do something else with the data.
Schema Method
If you open your dataset, then go to Details -> Schema, you can edit the schema to add a file path column, for each row this will have the value of the path of the file that the row comes from.
The key part is the _filePath
member of fieldSchemaList
and "addFilePath": true
under customMetadata
. The first is a special column that TextDataFrameReader
populates with the file path, the second tells the reader to populate that column. The other column in the example below (content
) just contains everything in each file.
For more details see the Metadata
section in the Foundry core backend
in platform documentation. This is also possible for csv's and more structured data with different Reader classes.
Full schema example
{
"fieldSchemaList": [
{
"type": "STRING",
"name": "content",
"nullable": null,
"userDefinedTypeClass": null,
"customMetadata": {},
"arraySubtype": null,
"precision": null,
"scale": null,
"mapKeyType": null,
"mapValueType": null,
"subSchemas": null
},
{
"type": "STRING",
"name": "_filePath",
"nullable": null,
"userDefinedTypeClass": null,
"customMetadata": {},
"arraySubtype": null,
"precision": null,
"scale": null,
"mapKeyType": null,
"mapValueType": null,
"subSchemas": null
}
],
"dataFrameReaderClass": "com.palantir.foundry.spark.input.TextDataFrameReader",
"customMetadata": {
"textParserParams": {
"parser": "SINGLE_COLUMN_PARSER",
"nullValues": null,
"nullValuesPerColumn": null,
"charsetName": "UTF-8",
"addFilePath": true,
"addByteOffset": false,
"addImportedAt": false
}
}
}