I want to encrypt and decrypt big files (think 20m lines) of text. The encryption service I am using can only encrypt a maximum size of 64kb. For the purposes of this question assume we are stuck with this service.
My solution is to split the huge file in chunks of 64kb, encrypt all of them in parallel and put the encrypted parts in a tar.gz
. Each part is numbered as part-xxx
to make sure I can restore the original file. At decryption time I unzip, decrypt each part in parallel and concat results in order.
The fun part: When I do that last part on a big enough file one of the following happens:
The tmux sessions dies, and i get logged out. No logs, no nothing.
I get this:
/home/estergiadis/kms/decrypt.sh: line 45: /usr/bin/find: Argument list too long
/home/estergiadis/kms/decrypt.sh: line 46: /bin/rm: Argument list too long
I tried several solutions based on xargs with no luck. Here is the interesting code:
echo "Decrypting chunks in parallel."
# -1 -f in ls helped me go from scenario 1 to scenario 2 above.
# Makes sense since I don't need sorting at this stage.
ls -1 -f part-* | xargs -I % -P 32 bash -c "gcloud kms decrypt --ciphertext-file % --plaintext-file ${OUTPUT}.%"
# Best case scenario, we die here
find $OUTPUT.part-* | xargs cat > $OUTPUT
rm $OUTPUT.part-*
Even more interesting: when find and rm report a problem, I can go to the temp folder with all parts, run the exact same commands myself and everything works.
In case it matters, all of this takes place in a RAM mounted filesystem. However RAM cannot possibly be the issue: I am on a machine with 256GB RAM, the files involved take up 1-2GB and htop
never shows more than 10% usage.