2

I'm trying save an great_expectations 'expectation_suite to Azue ADLS Gen 2 or Blob store with the following line of code.

ge_df.save_expectation_suite('abfss://polybase@mipolybasestagingsbox.dfs.core.windows.net/test/newdata/loggingtableupdate.json')

However, I'm getting the following error:

FileNotFoundError: [Errno 2] No such file or directory: 'abfss://polybase@mipolybasestagingsbox.dfs.core.windows.net/test/newdata/loggingtableupdate.json'

The following is successful, but I don't know where the expectation suite is saved to:

ge_df.save_expectation_suite('gregs_expectations.json')

If someone can let me know how to save to adls gen2 or let me know where the expectation is saved to that would be great

Patterson
  • 1,927
  • 1
  • 19
  • 56

1 Answers1

2

Great expectations can't save to ADLS directly - it's just using the standard Python file API that works only with local files. The last command will store the data into the current directory of the driver, but you can set path explicitly, for example, as /tmp/gregs_expectations.json.

After saving, the second step will be to uplaod it to ADLS. On Databricks you can use dbutils.fs.cp to put file onto DBFS or ADLS. If you're not running on Databricks, then you can use azure-storage-file-datalake Python package to upload file to ADLS (see its docs for details), something like this:

from azure.storage.filedatalake import DataLakeFileClient

with open('/tmp/gregs_expectations.json', 'r') as file:
    data = file.read()

file = DataLakeFileClient.from_connection_string("my_connection_string", 
                                                 file_system_name="myfilesystem", 
                                                 file_path="gregs_expectations.json")
file.create_file ()
file.append_data(data, offset=0, length=len(data))
file.flush_data(len(data))
Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • wow @alex this is amazing. I can't wait to check it out. I will let you know how I get on. Thanks in advance – Patterson Jul 10 '21 at 10:15
  • Hi @alex this worked like a dream. Thanks mate – Patterson Jul 11 '21 at 13:18
  • Hi @Alex, I thought I had this working. Can you me an example of "myfilesystem" ? I entered /tm/ but I got the error "The specified filesystem does not exist." – Patterson Jul 14 '21 at 16:58
  • filesystem here is the name of container inside storage account: https://learn.microsoft.com/en-us/python/api/overview/azure/storage-file-datalake-readme?view=azure-python#key-concepts – Alex Ott Jul 14 '21 at 17:06
  • ok, that definitely worked ... thanks sooooooooooooooo much Alex – Patterson Jul 14 '21 at 17:21