Following this question Splitting out a large file.
I would like to pipe calls from an Amazon s3://
bucket containing large gzipped files, process them with an awk command.
Sample file to process
...
{"captureTime": "1534303617.738","ua": "..."}
...
Script to optimize
aws s3 cp s3://path/to/file.gz - \
| gzip -d \
| awk -F'"' '{date=strftime("%Y%m%d%H",$4); print > "splitted."date }'
gzip splitted.*
# make some visual checks here before copying to S3
aws s3 cp splitted.*.gz s3://path/to/splitted/
Do you think I can wrap everything in the same pipeline to avoid writing files locally?
I can use Using gzip to compress files to transfer with aws command to be able to gzip and copy on the fly, but gzipping inside awk would be great.
Thank you.