4

Is there any easy way to use PowerShell to only get a list of "folders" from an S3 bucket, without listing every single object and just scripting a compiled list of distinct paths? There are hundreds of thousands of individual objects in the bucket I'm working in, and that would take a very long time.

It's possible this is a really stupid question and I'm sorry if that's the case, but I couldn't find anything on Google or SO to answer this. I've tried adding wildcards to -KeyPrefix and -Key params of Get-S3Object to no avail. That's the only cmdlet that seems like it might be capable of doing what I'm after.

Pointless backstory: I just want to make sure I'm transferring files to the correct, existing folders. I'm a contracted third party, so I don't have console login access and I'm not the person who maintains the AWS account.

I know this is possible using Java and C# and others, but I'm doing everything else involved with this fairly simple project in PS and was hoping to be able to stick with it.

Thanks in advance.

Anthony Neace
  • 25,013
  • 7
  • 114
  • 129
Nick Schroeder
  • 1,340
  • 1
  • 13
  • 17

4 Answers4

6

You can use the AWS Tools For PowerShell to list objects (via Get-S3Object) in the bucket and pull common prefixes from the response object.

Below is a small library to recursively retrieve subdirectories:

function Get-Subdirectories
{
  param
  (
    [string] $BucketName,
    [string] $KeyPrefix,
    [bool] $Recurse
  )

  @(get-s3object -BucketName $BucketName -KeyPrefix $KeyPrefix -Delimiter '/') | Out-Null

  if($AWSHistory.LastCommand.Responses.Last.CommonPrefixes.Count -eq 0)
  {
    return
  }

  $AWSHistory.LastCommand.Responses.Last.CommonPrefixes

  if($Recurse)
  {
    $AWSHistory.LastCommand.Responses.Last.CommonPrefixes | % { Get-Subdirectories -BucketName $BucketName -KeyPrefix $_ -Recurse $Recurse }
  }
}

function Get-S3Directories
{
  param
  (
    [string] $BucketName,
    [bool] $Recurse = $false
  )

  Get-Subdirectories -BucketName $BucketName -KeyPrefix '/' -Recurse $Recurse
}

This recursive function depends on updating the KeyPrefix on each iteration to check for subdirectories in each KeyPrefix passed to it. By setting the delimiter as '/', keys matching the KeyPrefix string before hitting the first occurance of the delimiter are rolled into the CommonPrefixes collection in the last response of $AWSHistory.

To retrieve only the top-level directories in an S3 Bucket:

PS C:/> Get-S3Directories -BucketName 'myBucket'

To retrieve all directories in an S3 Bucket:

PS C:/> Get-S3Directories -BucketName 'myBucket' -Recurse $true

This will return a collection of strings, where each string is a common prefix.

Example Output:

myprefix/
myprefix/txt/
myprefix/img/
myotherprefix/
...
Anthony Neace
  • 25,013
  • 7
  • 114
  • 129
1
$objects = Get-S3Object -BucketName $bucketname -ProfileName $profilename -Region $region
$paths=@()
foreach($object in $objects) 
{
    $path = split-path $object.Key -Parent 
    $paths += $path
}
$paths = $paths | select -Unique
write-host "`nNumber of folders "$paths.count""
Write-host "$([string]::join("`n",$paths)) "
Nathan Tuggy
  • 2,237
  • 27
  • 30
  • 38
1

The accepted answer is correct but with a flaw. If you have a large bucket with many "folders" (over 1000) you will only get the last 1000 prefixes by using:

$AWSHistory.LastCommand.Responses.Last.CommonPrefixes

AWS batches responses in 1000 increments. If you look at

$AWSHistory.LastCommand.Responses.History 

You will see multiple entries. Unfortunately only 5 by default. You can change that behavior by using the Set-AWSHistoryConfiguration function.

To increase the number of History responses use the -MaxServiceCallHistory parameter.

Set-AWSHistoryConfiguration -MaxServiceCallHistory 20

This will store the last 20 service calls for the next (and all subsequent) command.

With the above configuration you could retrieve up to 20000 SubFolders from a folder.

To retrieve all the folders do the following:

$subFolders = ($AwsHistory.LastCommand.Responses.History).CommonPrefixes

Caution: Increasing the configuration parameters will utilize more memory.

0

This version of Powershell iterates over 1000 keys in a single S3 Bucket (aws limits only 1000 keys for API get-S3object hence we need a while-loop to get over 1000 keys aka folders) After output generated to csv, remember to sort duplicates in Excel to remove duplicates (PS, anyone can assist to sort duplicates as i think my script not working well with duplicates)

#Main-Code 
$keysPerPage = 1000 #Set max key of AWS limit of 1000
$bucketN = 'testBucket' #Bucketname
$nextMarker = $null 
$output =@()
$Start = "S3 Bucket Name : $bucketN"
$End = "- End of Folder List -"

Do
{
  #Iterate 1000 records per do-while loop, this is to overcome the limitation of only 1000 keys retrieval per get-s3object calls by AWS 
  $batch = get-s3object -BucketName $bucketN -Maxkey $keysPerPage -Marker $nextMarker 

  $batch2 = $batch.key | % {$_.Split('/')[0]} | Sort -Unique 
  $output += $batch2 
  $batch2

  $nextMarker= $AWSHistory.LastServiceResponse.NextMarker
} while ($nextMarker)

   #Output to specific folder in a directory
   $Start | Out-file C:\Output-Result.csv  -Append
   $output | Out-file C:\Output-Result.csv  -Append
   $End | Out-file C:\Output-Result.csv -Append
ak86
  • 41
  • 1
  • 4