1

I'm trying to retrieve the most recent file uploaded before a certain time in a versioning-enabled GCS bucket. For example:

$ gsutil ls -la gs://my-versioned-bucket/
         7  2016-02-28T23:59:27Z  gs://my-versioned-bucket/file#1456707567816000  metageneration=1
         7  2016-02-29T01:00:11Z  gs://my-versioned-bucket/file#1456707611782000  metageneration=1
         7  2016-02-29T01:43:02Z  gs://my-versioned-bucket/file#1456710182089000  metageneration=1
TOTAL: 3 objects, 21 bytes (21 B)

Suppose I wanted to get the file that was active at the end of 2016-2-28 (namely generation #1456707567816000 in this case). Is there any better option than getting the list of all versions and looping through them?

Update real-world example:

I have a job that continuously regenerates a set of files (from an external source). Once it finishes, it starts again. An error from the external source started at time X, and renders all files generated after that date mangled. I know time X, but I don't know which set of files were corrupted. What's the easiest way to identify the most recent valid set of files (i.e. the latest generation that's older than X)?

dimo414
  • 47,227
  • 18
  • 148
  • 244
  • I think you'll have to loop through them. Are you looking for a version within a specific object name, or multiple object names? I'd suggest writing code with the API rather than using gsutil for something like this. – jterrace Feb 29 '16 at 17:22
  • I'm looking for the state of a set of files at a given point in time. Updated the question with an example. – dimo414 Feb 29 '16 at 18:34
  • 1
    You'll need to loop through the files. You can either write a script that parses the gsutil output or calls the API directly. – Travis Hobrla Mar 01 '16 at 21:51
  • Have you considered keeping track of each set by writing an "i'm done" file? – jterrace Mar 02 '16 at 16:47
  • It's not batched, it's continuous. As data comes in it gets stored in GCS. We use a versioned bucket intentionally to mitigate these sort of data corruption issues, since they're out of our control. But now that we have this nicely snapshotted history of our data, how can we get it out again? We obviously could list all versions and manually search, but that's undesirable when the number of files and versions you're storing gets large. – dimo414 Mar 02 '16 at 16:53

0 Answers0