0

What is the easiest approach to automatically download all export files onto a Windows system?

I need to download a full Google Workspace Data Export using Windows. The Google Workspace Data Export is similar to Google Takeout but for the whole organisation.

When the export files are generated they can be downloaded one by one using the web interface or downloaded using a gsutil command supplied by the same web interface.

gsutil -m cp -r \
  "gs://takeout-export-.../20210716T081530Z/CustomerOwnedData/" \
  "gs://takeout-export-.../20210716T081530Z/Resource:\ -10235762353432345231/"
  ...50 more lines
  .

This command does not work out of the box on Windows.

So far I've done the following

  • Removed all \\\n making it a single line statement.
  • Removed the white space escape "\ " inside the filename since it's already quoted.

The problem is still that the filenames in the export contain ":" which Windows doesn't allow.
I can download individual folders by specifying a new target folder name but that has to be done by hand folder by folder.

I've tried to rewrite the command into one command for each folder:

gsutil -m cp -r "gs://takeout-export-.../20210716T081530Z/Resource: -10235762353432345231/" "Resource: -10235762353432345231/"

This works only for folders with only one file inside. Most folders have two files resulting in the following:

CommandException: Destination URL must name a directory, bucket, or bucket
subdirectory for the multiple source form of the cp command.
CommandException: Destination URL must name a directory, bucket, or bucket
subdirectory for the multiple source form of the cp command.
CommandException: 2 files/objects could not be transferred.

Next I tried to rename the "Resource: ..." folders

gsutil -m mv "gs://takeout-export-.../20210716T081530Z/Resource: -10235762353432345231/" "gs://takeout-export-.../20210716T081530Z/Resource -10235762353432345231/"

But this failed with:

AccessDeniedException: 403 ...@... does not have storage.objects.create access to the Google Cloud Storage object.

I guess I don't have access to modify the Data Export files.

What knowledge do I as an administrator need to know to get access to a Google Workspace Data Export?

hultqvist
  • 761
  • 5
  • 13
  • I suspect the issue is that GCS object prefix includes colons and these may cause issues for Windows. [Windows](https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file) doesn't allow the name of folders to have some special characters including the ':'. You can rename the bucket folder (object prefix) without the ':' and try again. – Mousumi Roy Apr 20 '22 at 07:09

2 Answers2

1

I've struggled with this as well and went through all the same steps you have. I wish Google would just change their naming protocol to be Windows compatible. If you had your own paid Cloud account you could copy and in doing so rename the forbidden filenames, but you can't do that in a takeout bucket since you can't write anything to it yourself.

My solution ended up being installing a Linux distro via WSL2, downloading with gsutil, renaming the bad folders and then copying into Windows accessible storage.

Tom
  • 11
  • 1
0

It is quite some time that I downloaded a Workspace backup via gsutil, but if I'm not mistake the "Resource" folder is for the shared drive content.

I used the following command to download all the normal content:

gsutil -m rsync -r -x '^Resource*' gs://takeout-export-Unique-ID/Folder_Name/ “C:/Path_to_local_folder“

And the "Resource" content I did separately.

Nevertheless, I don't understand why Google is not simple providing a normal download link without the need of a separate tool. Or at least use a naming schema which is not causing troubles with the destination OS.

x0100
  • 1