1

I’m facing issues while trying to store Matplotlib graph in Azure Data Lake Store Gen2 by processing the Kmeans Elbow method from Local Pycharm pointing to Azure Databricks cluster.

For the below sample piece of ML code, am getting the error.

Elbow Curve:

import matplotlib.pyplot as plt

plt.savefig(graph_path, bbox_inches='tight')

Class shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem not found. OSError: [Errno 22] Invalid argument: 'abfss://cluster-container@project.dfs.core.windows.net/project/output/Elbow-Curve-20210325-222650.png'

Note: The code runs without any issues when pointed to local spark and local folder structure, the issue is with either Databricks or Azure Data Lake Store Gen2.

Any help is much appreciated!

Alex Ott
  • 80,552
  • 8
  • 87
  • 132

1 Answers1

1

Matplotlib doesn't know anything about the ADLS - it's designed to work with local file system. So to store image on the ADLS you need to do following:

  1. Store the image on the local file system of the driver, for example, as /tmp/my-image.png
  2. Copy the image into ADLS using the dbutils.fs.cp command, like this (see documentation for details):
dbutils.fs.cp("file:/tmp/my-image.png", graph_path)
Alex Ott
  • 80,552
  • 8
  • 87
  • 132