It looks like in your case the problem has to do with how spark APIs accesses the filesystem vs how python APIs accesses it see here for details. This is probably not the recommended way (I'm fairly new to Databricks myself), but if you're on a single node you can write your parquet to the local filesystem and mlflow can log it from there with something like:
with tempfile.TemporaryDirectory() as tmpdirname:
df.write.parquet(f'file:{tmpdirname}/my_parquet_table')
mlflow.log_artifacts(tmpdirname, artifact_path='my_parquet_table_name')
Keep in mind that a parquet "file" is actually a directory with a whole bunch of files in it, so you need to use log_artifacts
, not log_artifact
, and if you don't specify artifact_path
you'll get all the little files that make up the parquet file (directory) dumped directly into the root of your your mlflow artifacts. Also, Mlflow doesn't have any previewing capability for parquet files, so depending on your use case, logging parquet artifacts may not be as convenient as it first seems.
HTH