I'm attempting to decompress and read a .zst file from S3 in a programmatic way (i.e. not downloading it and running command line decomp on it).
Here's the code I'm running:
import boto3
import zstandard
import os
import io
AWS_S3_BUCKET = os.getenv("AWS_S3_BUCKET")
AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID")
AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY")
zstd = zstandard.ZstdDecompressor()
s3_client = boto3.client(
"s3",
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
)
response = s3_client.get_object(Bucket='AWS_S3_BUCKET', Key="folder/example_file_name.zst")
status = response.get("ResponseMetadata", {}).get("HTTPStatusCode")
# decompressed = decomp.decompress(response.get("Body").read())
## OR
# with zstd.stream_reader(io.BytesIO(response.get("Body").read())) as r:
# decompressed = r.read()
So I'm trying either of the two lines at the end separated by the "## OR". The first one tells me that it doesn't have any information about the length of the data so I tried to put in the "max_output_size=number_from_file_metadata" but it gives the same error:
ZstdError: error determining content size from frame header
And then with the "with..." statement, it gives this error:
ZstdError: zstd decompress error: Unknown frame descriptor
As far as I can tell the second error means that either the file isn't truly compressed using .zstd or it was compressed using "magicless" compression and the decompression isn't recognizing that attribute. I'm getting that from here: https://github.com/indygreg/python-zstandard/issues/79
But it's really unclear and seemingly not many people have had issues with this. Any help very much appreciated.