Is this AES GCM file encryption good practice?

Question

I'm using this to encrypt a file, and then to decrypt a file, using AES-GCM:

(do pip install pycryptodome first if not installed yet)

import Crypto.Random, Crypto.Protocol.KDF, Crypto.Cipher.AES

def cipherAES_GCM(pwd, nonce):
    key = Crypto.Protocol.KDF.PBKDF2(pwd, nonce, count=100_000)
    return Crypto.Cipher.AES.new(key, Crypto.Cipher.AES.MODE_GCM, nonce=nonce)    

# encrypt
plaintext = b'HelloHelloHelloHelloHelloHelloHello'  # in reality, read from a file
key = b'mykey'
nonce = Crypto.Random.new().read(16)
c, tag = cipherAES_GCM(key, nonce).encrypt_and_digest(plaintext)
ciphertext = nonce + tag + c     # write ciphertext to disk as the "encrypted file"

# decrypt
nonce, tag, c = ciphertext[:16], ciphertext[16:32], ciphertext[32:]  # read from the "encrypted file" on disk
plain = cipherAES_GCM(key, nonce).decrypt_and_verify(c, tag).decode()
print(plain)  # HelloHelloHelloHelloHelloHelloHello

Is this considered a good encryption practice, and what the potential weaknesses of this file encryption implementation?

Remark: I have 10,000 files to encrypt. If each single time I encrypt a file, I call the KDF (with a high count value), this will be highly unefficient!
A better solution would be: call the KDF only once (with a nonce1), and then for each file do:

nonce2 = Crypto.Random.new().read(16)
cipher, tag = AES.new(key, AES.MODE_GCM, nonce=nonce2).encrypt_and_digest(plain)

But then does this mean I have to write nonce1 | nonce2 | ciphertext | tag to disk for each file? This adds an additional 16-byte nonce1 to each file...

Topaco · Accepted Answer · 2020-11-20T13:55:24.467

A suggestion for improving your code would be to apply a 12 bytes nonce for GCM. Currently a 16 bytes nonce is used and this should be changed, see here sec. Note, and here.

Crucial for the security of GCM is that no key/nonce pair is used more than once, here. Since in your code for each encryption a random nonce is generated, this issue is prevented.

Your code applies the nonce also as salt for the key derivation, which is in principle no security problem as this does not lead to multiple use of the same key/nonce pair, here.

However, a disadvantage from this is possibly that the salt length is determined by the nonce length. If this is not desired (i.e. if e.g. a larger salt should be used), an alternative approach would be to generate a random salt for each encryption to derive both the key and nonce via the KDF, here. In this scenario, the concatenated data salt | ciphertext | tag would then be passed to the recipient. Another alternative would be to completely separate nonce and key generation and to generate for each encryption both a random nonce and a random salt for key generation. In this case the concatenated data salt | nonce | ciphertext | tag would have to be passed to the recipient. Note that like the nonce and the tag, also the salt is no secret, so that it can be sent along with the ciphertext.

The code applies an iteration count of 100,000. Generally, the following applies: The iteration count should be as high as can be tolerated for your environment, while maintaining acceptable performance, here. If 100,000 meets this criterion for your environment then this is OK.

The concatenation order you use is nonce | tag | ciphertext. This is not a problem as long as both sides know this. Often by convention, the nonce | ciphertext | tag order is used (e.g. Java implicitly appends the tag to the ciphertext), which could also be used in the code if you want to stick to this convention.

It is also important that an up-to-date, maintained library is used, which is the case with PyCryptodome (unlike its predecessor, the legacy PyCrypto, which should not be used at all).

Edit:
The PBKDF2 implementation of PyCryptodome uses by default 16 bytes for the length of the generated key, which corresponds to AES-128. For the digest HMAC/SHA1 is applied by default. The posted code uses these standard parameters, none of which are insecure, but can of course be changed if necessary, here.
Note: Although SHA1 itself is insecure, this does not apply in the context of PBKDF2 or HMAC, here. However, to support the extinction of SHA1 from the ecosystem, SHA256 could be used.

Edit: (regarding the update of the question):

The use case presented in the edited question is the encryption of 10,000 files. The posted code is executed for each file, so that a corresponding number of keys are generated via the KDF which leads to a corresponding loss of perfomance. This is described by you as highly unefficient. However, it should not be forgotten that the current code focuses on security and less on performance. In my answer I pointed out that e.g. the iteration count is a parameter which allows tuning between performance and security within certain limits.

A PBKDF (password based key derivation function) allows to derive a key from a weak password. To keep the encryption secure, the derivation time is intentionally increased so that an attacker cannot crack the weak password faster than a strong key (ideally). If the derivation time is shortened (e.g. by decreasing the iteration count or by using the same key more than once) this generally leads to a security reduction. Or in short, a performance gain (by a faster PBKDF) generally reduces security. This results in a certain leeway for more performant (but weaker) solutions.

The more performant solution you suggest is the following: As before, a random nonce is generated for each file. But instead of encrypting each file with its own key, all files are encrypted with the same key. For this purpose, a random salt is generated once, with which this key is derived via the KDF. This does indeed mean a significant performance gain. However, this is automatically accompanied by a reduction in security: Should an attacker succeed in obtaining the key, the attacker can decrypt all files (and not just one as in the original scenario). However, this disadvantage is not a mandatory exclusion criterion if it is acceptable within the scope of your security requirements (which seems to be the case here).

The more performant solution requires that the information salt | nonce | ciphertext | tag must be sent to the recipient. The salt is important and must not be missing, because the recipient needs the salt to derive the key via the PBKDF. Once the recipient has determined the key, the ciphertext can be authenticated with the tag and decrypted using the nonce. If it has been agreed with the recipient that the same key will be used for each file, it is sufficient for the recipient to derive the key once via the PBKDF. Otherwise the key must be derived for each file.

If the salt with its 16 bytes is unwanted (since it is identical for all files in this approach), alternative architectures could be considered. For example, a hybrid scheme might be used: A random symmetric key is generated and exchanged using a public key infrastructure. Also here, all files can be encrypted with the same key or each file can be encrypted with its own key.

But for more specific suggestions for a design proposal, the use case should be described in more detail, e.g. regarding the files: How large are the files? Is processing in streams/chunks necessary? Or regarding the recipients: How many recipients are there? What is aligned with the recipients? etc.

Thank you for your very detailed and sourced answer, really really thank you. I'm going to read this thoroughly. — Basj, Nov 19 '20 at 22:36
A problem I had with my code: I have 10,000 files to crypt. Each single time I encrypt a file, I call the KDF (with a high `count` value), which is highly unefficient! A better solution would be: call the KDF only once (with `nonce1`), and then for each file do `nonce2 = Crypto.Random.new().read(16); Crypto.Cipher.AES.new(key, AES.MODE_GCM, nonce=nonce2).encrypt_and_digest(file_content)`. But then does this mean I have to write `nonce1 | nonce2 | ciphertext | tag` to disk for each file? This adds an additional 16-byte `nonce1` to each file... — Basj, Nov 19 '20 at 22:41
... Then when decrypting 10k files of the form `nonce1 | nonce2 | ciphertext | tag` (`nonce1`can potentially vary from one file to another), I have re-create a new key with `key = PBKDF2(pwd, nonce1, count=100_000)` for each file, and this is wasted time! — Basj, Nov 19 '20 at 23:10
I have tried to answer the new questions. Maybe it helps. I have the impression that you first of all need a security concept for your use case and only after that a concrete implementation becomes relevant. Just a guess, does not have to be true. — Topaco, Nov 20 '20 at 14:42

KMG · Answer 2 · 2020-11-19T13:24:05.320

1

This seems to be fine but I have a recommendation which is to not use same nonce for encryption and key derivation (nonce stands for key used only once using same nonce so you can pass the md5 hash of nonce to the encryption function instead if you dont want to use another nonce(IV). Second I think you can switch to cryptography if you are interested in better security . This is example code using cryptography module to encrypt which also has the advantage of encrypting using 128-bit key which is secure and it take care of the rest such as IV(nonces), decryption and verification(is done using HMAC). So all your code above can be summarized in this few lines which lead to less complexity so arguably more secure code.

from cryptography.fernet import Fernet
plaintext = b"hello world"
key = Fernet.generate_key()
ctx = Fernet(key)
ciphertext = ctx.encrypt(plaintext)
print(ciphertext)
decryption = ctx.decrypt(ciphertext)
print(decryption)

EDIT: Note that the nonce you use will also weaken up the key since the nonce is sent with ciphertext, now the salt used for PBKDF is pointless and now the attacker have to just guess your password(assuming using default count) which in this case is very simple one, brute forcing can take no longer than 26^5 tries(total of lowercase alphabets for total of length 5).

edited Nov 19 '20 at 13:24

answered Nov 19 '20 at 12:34

KMG

1,433
1
8
19

Thank you for your answer. What can be the problem if using the same nonce for encyrption and key derivation? PS: I'm already using `pycryptodome` in the original question. I'll have a test with Fernet too. – Basj Nov 19 '20 at 12:37
Also, in your code you don't use a key derivation, can you update to use a password? – Basj Nov 19 '20 at 12:39
Last remark: are you sure Fernet uses 256 bit? Here is said it's 128: https://crypto.stackexchange.com/questions/43120/why-is-fernet-only-aes-128-cbc – Basj Nov 19 '20 at 12:41
@Basj yes edited my answer(didnt pay attention). If you use the same nonce for 2 same messages(under same key) then the ciphertext produced will be the same. Which would result in encryption being deterministic rather than probabilistic read about ```ECB``` mode – KMG Nov 19 '20 at 12:41
Yes but here I don't reuse the same nonce for 2 messages, I reuse nonce for both key derivation and for encryption. Can there be problems in this? – Basj Nov 19 '20 at 12:43
1

@Basj see this link yes it used 256 https://cryptography.io/en/latest/fernet.html – KMG Nov 19 '20 at 12:43
@Basj you dont use the same nonce yet. Imagine 2 banks communicating using ECB mode 4 AES messages each 16 byte(bank_name, bank_reciever, account_reciever, amount) since same (key, IV) an attacker wiretapping can see the ```ciphertext``` that correspond to his name(since he made a legitimate transaction today). Then he replace all the ciphertext block (of other transaction account_name)with his version so all transactions go to his balance. ```GCM``` wont prevent this since the inserted block is a legitimate block that was recorded earlier hope you get the idea. – KMG Nov 19 '20 at 12:49
Fernet is 128-bit, see https://cryptography.io/en/latest/fernet.html#implementation. Can you update your answer and add this source as link? – Basj Nov 19 '20 at 12:57
@Basj edited the answer with two more edits take a look. – KMG Nov 19 '20 at 13:04
Thanks @KhaledGaber. But then why do they write "Fernet is built on top of a number of standard cryptographic primitives. Specifically it uses: AES in CBC mode with a 128-bit key for encryption;" [in the official doc](https://cryptography.io/en/latest/fernet.html#implementation)? – Basj Nov 19 '20 at 13:11
@Basj Ok i didn't see that you seems to be right I thought that Fernet uses 256bit as this is the value it takes. so 16 bytes for the key and 16 bytes for ```HMAC``` key. I thought it encrypt using something like ```GCM``` which turned out to just use ```CBC``` mode the rest for ```HMAC```. gonna edit the answer :). – KMG Nov 19 '20 at 13:21

Is this AES GCM file encryption good practice?

2 Answers2