0

I have a ADLS structure as below and I would like to read the files from only the main directory and not the subdirectory. How can I skip the subdirectory in python.

Storage Account 
    |
    |__ sample_container
            |
            |__main_folder
                  |
                  |__ sub_folder
                  |       |
                  |       |__ file1.txt
                  |       |__ file2.csv
                  |       |__ file3.parquet
                  |
                  |__ config.txt
                  |__ data.csv
                  |__ data1.csv
                  |__ export.csv


self.container_client = blobservice.get_container_client(container_name)
for files in self.container_client.list_blobs():
    print(files)

If I use list_blobs(), it displays directories, sub-directories and files under the container. If I user list_blobs(main_folder), it displays directories, sub-directories and files under the main_folder. This output is as below.

main_folder
main_folder/CONFIG_MASTER.csv
main_folder/LAST_RUN.csv
**main_folder/Sub_folder
main_folder/Sub_folder/sample.csv
main_folder/Sub_folder/example.csv**
main_folder/EXECUTABLE_LOG.csv
main_folder/data_file.csv
main_folder/ProcessControl.csv

Now I only have to read files under main_folder and skip the sub_folder contents. How to achieve this in azure-python SDK?

Also, Is there a way to find out if a blob is a folder or a directory? I am using ADLS gen2 storage.

shankar
  • 196
  • 14

1 Answers1

-1

Yes, you can read the files from main folder and skip sub_folder. Follow below code.

I reproduced same thing in my environment. This is my hierarchy inside ADLS Gen2 Storage

Storage Account 
    |
    |__ pool
            |
            |__main
                  |
                  |__ sub_folder
                  |       |
                  |       |__ EmojisSample.txt
                  |       |__ part-doubts.txt
                  |       |__ part-pysamp.txt
                  |                  |
                  |__ Textsample.txt
                  |__ TextToSpeech.txt
                  |__ sampleTextFile.txt
                  

Sample Code:

from azure.storage.blob import BlockBlobService
Account_name  =  "xxxx"
Container_name  =  "xxxx"
SAS_Token="xxxx"

Blob_service  = BlockBlobService(account_name=Account_name,account_key=None,sas_token=SAS_Token)
gen  =  Blob_service.list_blobs(Container_name,prefix="main")

for blob in gen:
    blob_name=blob.name
    if ("Sub" not in blob_name):

        print(blob_name)

Output:

Ref2

B. B. Naga Sai Vamsi
  • 2,386
  • 2
  • 3
  • 11
  • Unfortunately, "Sub" cannot be hardcoded in my code. It can be any sub directory whose name is not known. We don't even know if its a file or directory. In /folder/filename.csv, since you have .csv you may think it as a file. But what if the filename doesn't have any extension like /folder/data. here data is file. in such case I want to first know if its a folder or a file. If its a file only then I should be considering the file. – shankar Aug 08 '22 at 13:59