11

Is it possible to mass rename objects on Google Cloud Storage using gsutil (or some other tool)? I am trying to figure out a way to rename a bunch of images from *.JPG to *.jpg.

joshhunt
  • 5,197
  • 4
  • 37
  • 60

4 Answers4

15

Here is a native way to do this in bash with an explanation below, line by line of the code:

gsutil ls gs://bucket_name/*.JPG > src-rename-list.txt
sed 's/\.JPG/\.jpg/g' src-rename-list.txt > dest-rename-list.txt
paste -d ' ' src-rename-list.txt dest-rename-list.txt | sed -e 's/^/gsutil\ mv\ /' | while read line; do bash -c "$line"; done
rm src-rename-list.txt; rm dest-rename-list.txt

The solution pushes 2 lists, one for the source and one for the destination file (to be used in the "gsutil mv" command):

gsutil ls gs://bucket_name/*.JPG > src-rename-list.txt
sed 's/\.JPG/\.jpg/g' src-rename-list.txt > dest-rename-list.txt

The line "gsutil mv " and the two files are concatenated line by line using the below code:

paste -d ' ' src-rename-list.txt dest-rename-list.txt | sed -e 's/^/gsutil\ mv\ /'

This then runs each line in a while loop: while read line; do bash -c "$line"; done

Lastly, clean up and delete the files created:

rm src-rename-list.txt; rm dest-rename-list.txt

The above has been tested against a working Google Storage bucket.

jackotonye
  • 3,537
  • 23
  • 31
beetlejuice
  • 161
  • 1
  • 4
  • liked the solution, only small comment, instead of: gsutil ls gs://bucket_name/*.JPG | sed 's/\.JPG/\.jpg/g' > dest-rename-list.txt simples to: cat src-rename-list.txt | sed 's/\.JPG/\.jpg/g' > dest-rename-list.txt – RELW Jun 25 '20 at 15:33
4

https://cloud.google.com/storage/docs/gsutil/addlhelp/WildcardNames

gsutil supports URI wildcards

EDIT

gsutil 3.0 release note

As part of the bucket sub-directory support we changed the * wildcard to match only up to directory boundaries, and introduced the new ** wildcard...

Do you have directories under bucket? if so, maybe you need to go down to each directories or use **.

gsutil -m mv gs://my_bucket/**.JPG gs://my_bucket/**.jpg

or

gsutil -m mv gs://my_bucket/mydir/*.JPG gs://my_bucket/mydir/*.jpg

EDIT
gsutil doesn't support wildcard for destination so far (as of 4/12/'14)
nether API.

so at this moment you need to retrieve list of all JPG files, and rename each files.

python example:

import subprocess
files = subprocess.check_output("gsutil ls gs://my_bucket/*.JPG",shell=True)
files = files.split("\n")[:-1]
for f in files:
    subprocess.call("gsutil mv %s %s"%(f,f[:-3]+"jpg"),shell=True)

please note that this would take hours.

HayatoY
  • 527
  • 1
  • 4
  • 13
  • Hmmm doesn't seem to work, comes up with "CommandException: Destination (gs://my_bucket/*.jpg) must match exactly 1 URL" – joshhunt Nov 27 '14 at 10:25
  • I edited :) sorry I don't have environment to test now, if not work I would test some other ways – HayatoY Nov 27 '14 at 12:14
  • Nah the directories are right. I think the issue is that it doesn't do wildcard replacement? – joshhunt Nov 27 '14 at 22:46
  • Indeed it doesn't work. It seems you have to do rename one by one after retrieving file list, even inefficient.. And GCS API doesn't seem to have copy/rename function.. – HayatoY Nov 28 '14 at 03:42
2

gsutil does not support parallelized and mass-copy/rename.

You have two options:

  • use a dataflow process to do the operation or
  • use GNU parallel to launch it using several processes

If you use GNU Parallel, it is better to deploy a new instance to do the mass copy/rename operation:

  • First: - Make a list of files you want to copy/rename (a file with source and destination separated by a space or tab), like this:
gs://origin_bucket/path/file gs://dest_bucket/new_path/new_filename
  • Second: Launch a new compute instance
  • Third: Login in that instance and install Gnu parallel
sudo apt install parallel
  • Third: authorize yourself with google (gcloud auth login) because the service account for compute might not have permissions to move/rename the files.
gcloud auth login
  • Make the copy (gsutil cp) or move (gsutil mv) operation with parallel:
   parallel -j 20 --colsep ' ' gsutil mv {1} {2} :::: file_with_source_destination_uris.txt

This will make 20 parallel runs of the gsutil cp operation.

Iñigo González
  • 3,735
  • 1
  • 11
  • 27
-1

Yes, it is possible:

Move/rename objects and/or subdirectories

Andrei Volgin
  • 40,755
  • 6
  • 49
  • 58
  • Seen that but not quite sure how it helps me mass rename objects? – joshhunt Nov 27 '14 at 09:31
  • @Andrei but I can't use gsutil + subprocess on my gae project – Avinash Raj Jul 28 '16 at 09:27
  • You don't need gsutil in a GAE project. You can simply retrieve a list of objects in your code and rename them. – Andrei Volgin Jul 28 '16 at 09:36
  • @AndreiVolgin Currently I'm using [gcs client library](https://github.com/GoogleCloudPlatform/appengine-gcs-client/tree/master/python) to copy,list,stat,delete files stored in gcs. I also need to implement renaming folders. That client lib don't have any methods for renaming. – Avinash Raj Jul 28 '16 at 09:38
  • So, what I'm going to do is, 1.create a new folder. 2. Copy all the contents of old folder into the newer one. 3. Delete the old folder . But it seems a long process.. – Avinash Raj Jul 28 '16 at 09:39
  • Cloud storage objects are immutable, so renaming and replacing is the same. – Andrei Volgin Jul 28 '16 at 09:40
  • Not sure if you can help. I tried this method and it didn't find the file with the prefix, it seems like it's looking to exactly match the path I put as opposed to a file with that prefix – user147529 Aug 04 '20 at 09:42
  • Can you provide an example? – Andrei Volgin Aug 04 '20 at 16:11