7

How can I get a list of directories in my container?

I can use Get-AzureStorageBlob to get all the blobs and filter by distinct prefix /name/, but it might be slow with millions of blobs.

Is there a proper way of achieving this in PowerShell?

David Makogon
  • 69,407
  • 21
  • 141
  • 189
Stefano d'Antonio
  • 5,874
  • 3
  • 32
  • 45
  • 1
    Not sure what you mean by 'proper way'. Also note: there are no *directories* inside a container, only blobs (but with names that might mimic directories though). – David Makogon Jun 09 '16 at 12:18
  • What are you trying to do? What is the actual problem you are trying to solve? Cloud containers, whether Azure, Amazon, Google, Openstack have no directories. Directories imply recursion and that *doesn't* scale. All of them use a character like `/` as a separator and emulate directory operations over the flat list of files. – Panagiotis Kanavos Jun 09 '16 at 12:40
  • Consider that it's actually *very* fast to get a list of all files under a container, or at least using a high level prefix, filter it using string operations on the client side then requesting the individual files you need. – Panagiotis Kanavos Jun 09 '16 at 12:42
  • @DavidMakogon this isn't fully true, I thought the same until I saw this: https://msdn.microsoft.com/en-gb/library/microsoft.windowsazure.storage.blob.cloudblobdirectory.aspx that makes me think there is a concept of directory for the blobs. – Stefano d'Antonio Jun 10 '16 at 11:10
  • @PanagiotisKanavos it takes 30minutes every time so it's not really an acceptable speed. – Stefano d'Antonio Jun 10 '16 at 11:11
  • @Uno It's absolutely true. There's storage account -> container -> blob. That's it. You can simulate directories with the delimiter character. And the class you pointed to? "Represents a *virtual* directory of blobs, designated by a delimiter character." Not real directories. Just a convenience class. – David Makogon Jun 10 '16 at 11:12
  • @DavidMakogon regardless of the underlying implementation, I'm looking for a fast way of finding the directories (or prefixes if you prefer) in the container. Looking at each single blob isn't really an option as it's too slow. As there is some kind of concept of a directory, my question was: is there a way to get them quickly? – Stefano d'Antonio Jun 10 '16 at 11:23

2 Answers2

4

The other answer is correct that there is nothing out of the box as there is no real thing as a folder however only file names that contain a folder like path.

Using a regex in PowerShell you can find top-level folders. As mentioned, this may be slow is there are millions of items in your account but for a small number it may work for you.

$context = New-AzureStorageContext -ConnectionString '[XXXXX]'
$containerName = '[XXXXX]'

$blobs = Get-AzureStorageBlob -Container $containerName -Context $context 
$folders = New-Object System.Collections.Generic.List[System.Object]

foreach ($blob in $blobs)
{        

    if($blob.Name -match '^[^\/]*\/[^\/]*$')
    {
        $folder = $blob.Name.Substring(0,$blob.Name.IndexOf("/"));
        if(!$folders.Contains($folder))
        {
            $folders.Add($folder)
        }
    }      
}  

foreach ($folder in $folders)
{
    Write-Host $folder
}
John
  • 29,788
  • 18
  • 89
  • 130
3

There's no concept of directories, only containers and blobs. A blob name may have delimiters with look like directories, and may be filtered.

If you choose to store millions of blobs in a container, then you'll be searching through millions of blob names, even with delimiter filtering, whether using PowerShell, SDK, or direct REST calls.

As far as "proper" way: There is no proper way: Only you can decide how you organize your containers and blobs, and where (or if) you choose to store metadata for more efficient searching (such as a database).

David Makogon
  • 69,407
  • 21
  • 141
  • 189
  • So there is no solution? I can only start moving blobs to different containers to optimise the search? – Stefano d'Antonio Jun 10 '16 at 12:25
  • I don't know what you mean by "there is no solution." Blob storage is not a database, and has none of the search/index/query facilities of a database engine. If you have search needs, and you're dealing with millions of blobs, then you should consider storing searchable metadata in a database, as I mentioned in my answer. – David Makogon Jun 10 '16 at 12:51
  • I don't have a database at the moment; "there is no solutions" means there is no solution. I can't find all the blobs by /prefix/ quickly. – Stefano d'Antonio Jun 10 '16 at 13:01
  • Well... I don't know what to tell you. There are plenty of solutions, including "search through millions of blobs in a container" which you already considered, "store blobs across multiple containers," and "use a database." Nothing more to say, really. – David Makogon Jun 10 '16 at 13:04
  • I don't know how to explain this anymore: I can search through million blobs, but not efficiently, I was looking for an efficient solution, no efficient solution appears to be the answer (unfortunately). I'm assuming, when you talked about metadata database, you meant something that should have been in place before, not something I can build on top of my current data? – Stefano d'Antonio Jun 10 '16 at 13:08
  • Another efficient solution might be indexing your blobs in a storage table and using that for the search. Effectively a primitive version of the "use a database" solution that doesn't require an add'l dependency. It would be trivial to build an index for what you have now. Just perform your "million blob search" one time and store the results in whatever lookup scheme makes sense for you. – Greg D Jun 16 '16 at 05:55