TL;DR: How can I collect metadata (errors during parsing) from distributed reads into a dask dataframe collection.
I currently have a proprietary file format i'm using to feed into dask.DataFrame. I have a function that accepts a file path and returns a pandas.DataFrame, used internally by dask.DataFrame successfully to load multiple files to the same dask.DataFrame.
Up until recently, I was using my own code to merge several pandas.DataFrames into one, and now i'm working on using dask instead. When parsing the file format i may encounter errors and certain conditions i want to log and associate with the dask.DataFrame object as metadata (logs, origin of data, etc).
Its important to note that when reasonable, I'm using MultiImdices quite heavily (13 index levels, 3 column levels). For metadata that describes the entire dataframe and not specific rows, I'm using attributes.
Using a custom function, I could pass the metadata in a tuple with the actual DataFrame. Using pandas, I could add it to the _metadata field and as attributes to the DataFrame obejcts. How can I collect metadata from separate pandas.DataFrame objects when using the dask framework?
Thanks!