14

I know the idea of 'folders' is sort of non existent or different in Google Cloud Storage, but I need a way to delete all objects in a 'folder' or with a given prefix from Java.

The GcsService has a delete function, but as far as I can tell it only takes 1 GscFilename object and does not honor wildcards (i.e., "folderName/**" did not work).

Any tips?

shieldstroy
  • 1,307
  • 1
  • 10
  • 24

3 Answers3

13

Extremely late to the party, but here's for current google searches. We can delete multiple blobs efficiently by leveraging com.google.cloud.storage.StorageBatch.

Like so:

public static void rmdir(Storage storage, String bucket, String dir) {
    StorageBatch batch = storage.batch();
    Page<Blob> blobs = storage.list(bucket, Storage.BlobListOption.currentDirectory(),
            Storage.BlobListOption.prefix(dir));
    for(Blob blob : blobs.iterateAll()) {
        batch.delete(blob.getBlobId());
    }
    batch.submit();
}

This should run MUCH faster than deleting one by one when your bucket/folder contains a non trivial amount of items.

Edit since this is getting a little attention, I'll demo error handling:

public static boolean rmdir(Storage storage, String bucket, String dir) {
    List<StorageBatchResult<Boolean>> results = new ArrayList<>();
    StorageBatch batch = storage.batch();
    try {
        Page<Blob> blobs = storage.list(bucket, Storage.BlobListOption.currentDirectory(),
            Storage.BlobListOption.prefix(dir));
        for(Blob blob : blobs.iterateAll()) {
            results.add(batch.delete(blob.getBlobId()));
        }
    } finally {
        batch.submit();
        return results.stream().allMatch(r -> r != null && r.get());
    }
}

This method will: Delete every blob in the given folder of the given bucket returning true if so. The method will return false otherwise. One can look into the return method of batch.delete() for a better understanding and error proofing.

To ensure ALL items are deleted, you could call this like:

boolean success = false
while(!success)) {
    success = rmdir(storage, bucket, dir);
}
MeetTitan
  • 3,383
  • 1
  • 13
  • 26
  • nice solution but this actually won't work if you try and run it in a servlet and you have a non trivial amount of items, a taskqueue doesn't work either with this. – Jonathan Laliberte Mar 20 '19 at 22:44
  • @Jonathan, it works on my taskqueues. Does it just time out for you? Maybe it's the number of items, or even variables such as bandwidth and latency (even if using GAE sevrlets, is your cloud storage in the same region?) – MeetTitan Mar 21 '19 at 01:46
  • 1
    it was giving me: `javax.servlet.ServletException: java.lang.IllegalStateException` not sure what the reason was exactly. There was over 5000 objects though – Jonathan Laliberte Mar 21 '19 at 02:23
  • 1
    @Jonathan I can delete over 9000 ;) objects with this method in a taskqueue, so I'd be very interested in the full stack trace if you ever run it again. – MeetTitan Mar 21 '19 at 05:40
  • This is not working for me. just returning false always. There is no exception when I'm running this. – Pankaj Singhal Jan 21 '20 at 11:26
  • Also, iterating over it only lists the folder only. The iterator contains only 1 item - the directory entry. And when calling the delete function, it is not working because the generation is null for the fetched blob/directory. Apparently, delete only works if generation matches – Pankaj Singhal Jan 21 '20 at 11:28
  • This solution isn't working for me, I get `java.lang.IllegalStateException` and I don't know the reason. – Mahmoud Yusuf Jan 26 '21 at 17:27
12

The API only supports deleting a single object at a time. You can only request many deletions using many HTTP requests or by batching many delete requests. There is no API call to delete multiple objects using wildcards or the like. In order to delete all of the objects with a certain prefix, you'd need to list the objects, then make a delete call for each object that matches the pattern.

The command-line utility, gsutil, does exactly that when you ask it to delete the path "gs://bucket/dir/**. It fetches a list of objects matching that pattern, then it makes a delete call for each of them.

If you need a quick solution, you could always have your Java program exec gsutil.

Here is the code that corresponds to the above answer in case anyone else wants to use it:

public void deleteFolder(String bucket, String folderName) throws CoultNotDeleteFile {
  try
  {
    ListResult list = gcsService.list(bucket, new ListOptions.Builder().setPrefix(folderName).setRecursive(true).build());

    while(list.hasNext())
    {
      ListItem item = list.next();
      gcsService.delete(new GcsFilename(file.getBucket(), item.getName()));
    }
  }
  catch (IOException e)
  {
    //Error handling
  }
}
Ankit Sultana
  • 417
  • 4
  • 14
Brandon Yarbrough
  • 37,021
  • 23
  • 116
  • 145
10

I realise this is an old question, but I just stumbled upon the same issue and found a different way to resolve it.

The Storage class in the Google Cloud Java Client for Storage includes a method to list the blobs in a bucket, which can also accept an option to set a prefix to filter results to blobs whose names begin with the prefix.

For example, deleting all the files with a given prefix from a bucket can be achieved like this:

Storage storage = StorageOptions.getDefaultInstance().getService();
Iterable<Blob> blobs = storage.list("bucket_name", Storage.BlobListOption.prefix("prefix")).iterateAll();
for (Blob blob : blobs) {
    blob.delete(Blob.BlobSourceOption.generationMatch());
}
dilico
  • 694
  • 1
  • 6
  • 14