2

I have a word cloud created using WordCloud class. I am plotting this word cloud using matplotlib. Now I want to save this figure on azure blob storage, but I can't find any python SDK to do the same.

In order to use *plt.savefig(), path for blob storage is needed. Could anyone tell how this path can be mentioned or some other way to store it on blob?

Code I am using is:

fig, ax = plt.subplots()
words = text.split()
word_cloud = WordCloud(width = 8000, height = 800, 
                background_color ='black', 
                min_font_size = 10).generate(str(text))
plt.imshow(word_cloud)
display(fig)
nids22.iti
  • 55
  • 1
  • 5

2 Answers2

0

As per my research, you cannot save Matplotlib output to Azure Blob Storage directly.

You may follow the below steps to save Matplotlib output to Azure Blob Storage:

Step 1: You need to first save it to the Databrick File System (DBFS) and then copy it to Azure Blob storage.

Saving Matplotlib output to Databricks File System (DBFS): We are using the below command to save the output to DBFS: plt.savefig('/dbfs/myfolder/Graph1.png')

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'fruits':['apple','banana'], 'count': [1,2]})
plt.close()
df.set_index('fruits',inplace = True)
df.plot.bar()
plt.savefig('/dbfs/myfolder/Graph1.png')

enter image description here

Step 2: Copy the file from Databricks File System to Azure Blob Storage.

There are two methods to copy file from DBFS to Azure Blob Stroage.

Method 1: Access Azure Blob storage directly

Access Azure Blob Storage directly by setting "Spark.conf.set" and copy file from DBFS to Blob Storage.

spark.conf.set("fs.azure.account.key.< Blob Storage Name>.blob.core.windows.net", "<Azure Blob Storage Key>")

Use dbutils.fs.cp to copy file from DBFS to Azure Blob Storage:

dbutils.fs.cp('dbfs:/myfolder/Graph1.png', 'wasbs://<Container>@<Storage Name>.blob.core.windows.net/Azure')

enter image description here

Method 2: Mount Azure Blob storage containers to DBFS

You can mount a Blob storage container or a folder inside a container to Databricks File System (DBFS). The mount is a pointer to a Blob storage container, so the data is never synced locally.

dbutils.fs.mount(
  source = "wasbs://sampledata@chepra.blob.core.windows.net/Azure",
  mount_point = "/mnt/chepra",
  extra_configs = {"fs.azure.sas.sampledata.chepra.blob.core.windows.net":dbutils.secrets.get(scope = "azurestorage", key = "azurestoragekey")})

Use dbutils.fs.cp copy the file to Azure Blob Storage Container:

dbutils.fs.cp('dbfs:/myfolder/Graph1.png', '/dbfs/mnt/chepra')

enter image description here

By following Method1 or Method2 you can successfully save the output to Azure Blob Storage.

enter image description here

Hope this helps. Do let us know if you any further queries.

CHEEKATLAPRADEEP
  • 12,191
  • 1
  • 19
  • 42
  • Hi, If my answer is helpful for you, you can accept it as answer( click on the check mark beside the answer to toggle it from greyed out to filled in.). This can be beneficial to other community members. Thank you. – CHEEKATLAPRADEEP Mar 09 '20 at 09:52
0

I'll assume you have mounted the blob storage (if not please refer to databricks guide )

after that you can follow:

plt.figure(figsize=(20,35))
plt.pcolor(df, cmap="gray")
plt.yticks(np.arange(0.5, len(df.index), 1), df.index)
plt.xticks(np.arange(0.5, len(df.columns), 1), df.columns)

#create the folder where the plot needs to sit - matplotlib cannot create folders 
# and even create an empty file with dbutils.fs.put will not work 
dbutils.fs.mkdirs('/mnt/...base_path.../folder/')
# save the file using /dbfs/ in front of the regular path 
fig.savefig('/dbfs/mnt/...base_path.../folder/file_name.png') 

et viola!

Have a good one.