3

This may be an uncommon question as I believe it has never been asked before, but is it possible to export a Pandas data frame straight to an Azure Data Lake Storage as a CSV file?

To add some context, I have a pandas dataframe which gets exported as a CSV file to a local directory, using the datalakeserviceclient I then get the CSV file from the file path and write the file into the data lake storage.


docs[:0].to_csv("test.csv", index = False)
docs.to_csv("test.csv", index = False, header = False ,mode = 'a', quoting = csv.QUOTE_NONNUMERIC)

try:  
    global service_client
        
    service_client = DataLakeServiceClient(account_url="{}://{}.dfs.core.windows.net".format(
        "https", "XXXX"), credential='XXX')
    

    file_system_client = service_client.get_file_system_client(file_system="root")

    directory_client = file_system_client.get_directory_client("test_db") 

    file_client = directory_client.create_file("test.csv") 
    local_file = open(r"C:XXXX\test.csv",'rb') 

    file_contents = local_file.read()

    file_client.upload_data(file_contents, overwrite=True) 


except Exception as e:
    print(e) 


However, I don't want the data frame to be exported to my local directory, instead I want to find a way to export it straight to the data lake storage. Is this actually possible?

Any help is appreciated

jcoke
  • 1,555
  • 1
  • 13
  • 27
  • Have you explored the Microsoft docs? they are pretty comprehensive. https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-directory-file-acl-python – Umar.H Feb 05 '21 at 09:30
  • 1
    yes, that's what I used in order to find how to upload a file to a directory but what i'm trying to know is whether it can just go straight from a data frame to a data lake without having to save it locally – jcoke Feb 05 '21 at 09:35

1 Answers1

3

pandas.to_csv (doc) can save the dataframe into a buffer.

Try the following code:

from io import StringIO
text_stream = StringIO()

docs.to_csv(text_stream)
# the rest of your code

file_client.upload_data(text_stream, overwrite=True)
arhr
  • 1,505
  • 8
  • 16