4

My trained deep learning model exists out of a couple of files in a folder. So this has nothing to do with zipping dataframes.

I want to zip this folder (in Azure Blob storage). But when I do with shutil this does not seem to work:

import shutil
modelPath = "/dbfs/mnt/databricks/Models/predictBaseTerm/noNormalizationCode/2020-01-10-13-43/9_0.8147903598547376"
zipPath= "/mnt/databricks/Deploy/" (no /dbfs here or it will error)
shutil.make_archive(base_dir= modelPath, format='zip', base_name=zipPath)

Anybody any idea how to do this and get the file onto the Azure Blob storage ( where I read it from)?

Axxeption
  • 273
  • 1
  • 4
  • 18

2 Answers2

12

In the end I figured it out myself.

It is not possible to directly write to dbfs (Azure Blob storage) with Shutil.

You need to first put the file on the local driver node of databricks like this (read it somewhere in the documentation that you cannot directly write to Blob storage):

import shutil
modelPath = "/dbfs/mnt/databricks/Models/predictBaseTerm/noNormalizationCode/2020-01-10-13-43/9_0.8147903598547376"
zipPath= "/tmp/model"
shutil.make_archive(base_dir= modelPath, format='zip', base_name=zipPath)

and then you can copy the file from your local driver node to blob storage. Please note the "file:" to grab the file from local storage!

blobStoragePath = "dbfs:/mnt/databricks/Models"
dbutils.fs.cp("file:" +zipPath + ".zip", blobStoragePath)
General Grievance
  • 4,555
  • 31
  • 31
  • 45
Axxeption
  • 273
  • 1
  • 4
  • 18
  • Thanks, this solution helped me to avoid "OSError: [Errno 95] Operation not supported" error. – Sandy Apr 05 '23 at 05:29
3

Actually, without using shutil, I can compress files in Databricks dbfs to a zip file as a blob of Azure Blob Storage which had been mounted to dbfs.

Here is my sample code using Python standard libraries os and zipfile.

# Mount a container of Azure Blob Storage to dbfs
storage_account_name='<your storage account name>'
storage_account_access_key='<your storage account key>'
container_name = '<your container name>'

dbutils.fs.mount(
  source = "wasbs://"+container_name+"@"+storage_account_name+".blob.core.windows.net",
  mount_point = "/mnt/<a mount directory name under /mnt, such as `test`>",
  extra_configs = {"fs.azure.account.key."+storage_account_name+".blob.core.windows.net":storage_account_access_key})

# List all files which need to be compressed
import os
modelPath  = '/dbfs/mnt/databricks/Models/predictBaseTerm/noNormalizationCode/2020-01-10-13-43/9_0.8147903598547376'
filenames = [os.path.join(root, name) for root, dirs, files in os.walk(top=modelPath , topdown=False) for name in files]
# print(filenames)

# Directly zip files to Azure Blob Storage as a blob
# zipPath is the absoluted path of the compressed file on the mount point, such as `/dbfs/mnt/test/demo.zip`
zipPath = '/dbfs/mnt/<a mount directory name under /mnt, such as `test`>/demo.zip'
import zipfile
with zipfile.ZipFile(zipPath, 'w') as myzip:
  for filename in filenames:
#    print(filename)
    myzip.write(filename)

I tried to mount my test container to dbfs and run my sample code, then I got the demo.zip file which contains all files in my test container, as the figure below.

enter image description here

Peter Pan
  • 23,476
  • 4
  • 25
  • 43
  • This looks neater than what I did! thanks for the in depth explanation. – Axxeption Jan 15 '20 at 09:09
  • Tried the same code as above but getting OSError: [Errno 95] Operation not supported. Could you please advise? I have posted a question : https://stackoverflow.com/questions/75853317/getting-error-errno-95-operation-not-supported-while-writing-zip-file-in-data – Sharma Mar 27 '23 at 11:47