2

This code works for a file myfile which fits in RAM:

import Crypto.Random, Crypto.Cipher.AES   # pip install pycryptodome

nonce = Crypto.Random.new().read(16)
key = Crypto.Random.new().read(16)  # in reality, use a key derivation function, etc. ouf of topic here
cipher = Crypto.Cipher.AES.new(key, Crypto.Cipher.AES.MODE_GCM, nonce=nonce)

out = io.BytesIO()
with open('myfile', 'rb') as g:
    s = g.read()
ciphertext, tag = cipher.encrypt_and_digest(s)
out.write(nonce)
out.write(ciphertext)
out.write(tag)

But how to encrypt a 64 GB file using this technique?

Obviously, the g.read(...) should use a smaller buffer-size, e.g. 128 MB.

But then, how does it work for the crypto part? Should we keep a (ciphertext, tag) for each 128-MB chunk?

Or is it possible to have only one tag for the whole file?

Basj
  • 41,386
  • 99
  • 383
  • 673
  • I would use a much smaller buffer size, a `bytearray` sized something like 65536. You can experiment with different sizes to see if there is any performance difference. The encryption can be done in pieces. For each piece of data `d` you get a piece of the ciphertext by calling `cipher.encrypt(d)`. After the last piece you must call `cipher.digest()` to get the tag. – President James K. Polk Nov 22 '20 at 13:33
  • @PresidentJamesK.Polk Thanks, I think this is the answer, then! You can post it as an answer. About the tag: will `.digest()` apply to the whole file or only the last piece? – Basj Nov 22 '20 at 14:08
  • 1
    `.digest()` applies to everything passed to `.update()` and `.encrypt()` since the cipher object was created, so yes, the whole file. – President James K. Polk Nov 22 '20 at 14:10
  • What do you think about the solution I posted @PresidentJamesK.Polk? Is it what you had in mind? Would you put the `tag` at the end (leading to [this problem](https://stackoverflow.com/questions/64959048/read-blocks-from-a-file-object-until-x-bytes-from-the-end) to stop reading at the end minus 16 bytes), or at the start, thus requiring multiple uses of `f.seek(...)`? – Basj Nov 22 '20 at 20:32

1 Answers1

1

As mentioned in @PresidentJamesK.Polk's comment, this seems to be the solution:

out.write(nonce)
while True:
    block = g.read(65536)
    if not block:
        break
    out.write(cipher.encrypt(block))
out.write(cipher.digest())  # 16-byte tag at the end of the file

The only problem is that, when reading back this file for decryption, stopping at the end minus 16 bytes is a bit annoying.

Or maybe one should do this:

out.write(nonce)
out.seek(16, 1)  # go forward of 16 bytes, placeholder for tag
while True:
   ...
   ...
out.seek(16)
out.write(cipher.digest())  # write the tag at offset #16 of the output file

?

Basj
  • 41,386
  • 99
  • 383
  • 673
  • That should work fine. For stopping 16 bytes before the end, if reading from a file then you can use the solution you linked to but it seems simpler to me to avoid `ftell()` and just get the file size with `os.stat().st_size` and only read in `st_size - 16` bytes. Then read in the last 16 bytes for the tag. – President James K. Polk Nov 22 '20 at 22:07