I have a Redshift Spectrum external table backed by Glue with data in S3. To cut cost, S3 data in some partitions has been moved to Deep Archive storage class.
However, one of those partitions is now needed for queries. I've initiated an S3 ObjectRestore and twiddled my thumbs for 12 hours while S3 does its thing. In the console, the S3 data now shows as restored. I've also ensured the Glue partition where the data lives is present and correct.
However, when I query the partition where the data is restored, Redshift returns 0 rows. The partition shows in svv_external_partitions, and I am able to download the object from the S3 path there and read the file. It just seems that Redshift Spectrum is ignoring the restored objects, perhaps because although restored they are still classified as Deep Archive?
Does anyone know if this the expected behavior? I had anticipated that once restored, the data would be queryable in Redshift until the restoration period expired. I can find no mention of this type of case in Redshift or S3 docs. This posting has a similar question about Spectrum/Deep Archive, but does not ask about data that has been restored.
I also get a similar result when querying the table via Athena.
S3 Deep Archive data is restored, Glue partition is present and listed in Redshift's svv_external_partitions.
Querying that partition in Redshift:
Expected: Query returns same rows as I see when I download the restored file from S3 and read it manually.
Actual: Query returns successfully with 0 rows.