1

I am trying to decrypt a p7m attachment using CryptQueryObject / CryptDecryptMessage functions. For large files (30Mb) each call can take up to 30 seconds to execute. Outlook itself has no problem opening an encrypted message and showing its contents instantaneously.

I do realize CryptQueryObject is deprecated (but CryptDecryptMessage is not). It does not matter what parameters I use and whether the data is in a memory blob or a file. Below is the Delphi code (C++ would be very similar). If I pause the execution while either function is executing, I can see from the call stack that the code is resizing or allocating a new memory block. It is almost as if the code is building a buffer reallocating/copying the previous contents on each iteration.

Here is what the call looks like, nothing extraordinary, no amount of playing with the parameters makes the calls any faster.

if not CryptQueryObject(CERT_QUERY_OBJECT_FILE,//CERT_QUERY_OBJECT_BLOB,
                        pointer(PWideChar(wFileName)),//@stCryptBlob,
                        CERT_QUERY_CONTENT_FLAG_PKCS7_UNSIGNED or CERT_QUERY_CONTENT_FLAG_PKCS7_SIGNED, //CERT_QUERY_CONTENT_FLAG_ALL, //CERT_QUERY_CONTENT_FLAG_PKCS7_SIGNED_EMBED,
                        CERT_QUERY_FORMAT_FLAG_ALL, //CERT_QUERY_FORMAT_FLAG_BINARY,
                        0,
                        nil, //@dwEncoding,    // X509_ASN_ENCODING | PKCS_7_ASN_ENCODING
                        @dwContentType, // a PKCS7 signed message embedded in a file    CERT_QUERY_CONTENT_PKCS7_SIGNED_EMBED = 10
                        nil, //@dwFormatType,  // the content is in binary format CERT_QUERY_FORMAT_BINARY = 1
                        @hStore,
                        nil, //@hMsg, // for dwContentType == CERT_QUERY_CONTENT_PKCS7_SIGNED,
                                      //        CERT_QUERY_CONTENT_PKCS7_UNSIGNED
                                      //        CERT_QUERY_CONTENT_PKCS7_SIGNED_EMBED
                        nil {@pvContext}) then RaiseLastWindowError('CryptQueryObject');

And here is the typical call stack when the function is executing. It is always memory/heap related.

:77dc8913 ntdll.memcpy + 0x33
:77d93526 ; 
:77d92867 ; 
:77d92763 ntdll.RtlReAllocateHeap + 0x43
:75d3a1f5 ; C:\WINDOWS\SysWOW64\KERNELBASE.dll
:754a869c MSASN1.ASN1EncSetError + 0x5c
:754a5bf9 ; C:\WINDOWS\SysWOW64\MSASN1.dll
:754a1b67 MSASN1.ASN1BERDecOctetString + 0x17
:769b485f ; C:\WINDOWS\SysWOW64\CRYPT32.dll
:769b42e5 ; C:\WINDOWS\SysWOW64\CRYPT32.dll
:754a2e3c ; C:\WINDOWS\SysWOW64\MSASN1.dll
:7690a697 ; C:\WINDOWS\SysWOW64\CRYPT32.dll
:769b3354 ; C:\WINDOWS\SysWOW64\CRYPT32.dll
:7694053a ; C:\WINDOWS\SysWOW64\CRYPT32.dll
:7692d71b ; C:\WINDOWS\SysWOW64\CRYPT32.dll
:7692d373 ; C:\WINDOWS\SysWOW64\CRYPT32.dll
uEncryptedMessage.DecodeP7M($356A3C50)

Anybody knows why this happens?
Did anybody have luck processing large encrypted blobs using these functions?
Would I be better off using the new CNG functions? What would the sequence of calls be like then?

Dmitry Streblechenko
  • 62,942
  • 4
  • 53
  • 78
  • There is no question! – Delphi Coder Jul 02 '22 at 00:07
  • The question is how to make it faster? – Dmitry Streblechenko Jul 02 '22 at 00:50
  • "_30Mb_" strictly means `M` = [mega](https://en.wikipedia.org/wiki/Mega-) (1 * 1000 * 1000) and `b` = **bit** ([see Megabit](https://en.wikipedia.org/wiki/Megabit)) - that'd be 3,750,000 **byte**. Whereas `Mi` = [mebi](https://en.wikipedia.org/wiki/Binary_prefix#IEC_prefixes) (1 * 1024 * 1024) and `B` = **byte**. Did you mean `30 MiB` instead of `3.75 kB`? – AmigoJack Jul 02 '22 at 08:33
  • Have you tried pre-allocating a buffer/buffers so they don't have to be reallocated? Maybe just making the buffers larger would help? – rossum Jul 02 '22 at 10:03
  • Cryptographic calculations are optimized for input-invariant execution times. This is generally orthogonal to raw performance. Whether you can get better performance with the CNG API is something you would need to evaluate. – IInspectable Jul 02 '22 at 13:59
  • @rossum - `CryptQueryObject` does not use an output buffer, it exists solely to extract various properties of the input blob and IMHO should not even try to decrypt the buffer: surely it does not need to decrypt the whole thing to figure out the type of encryption? For `CryptDecryptMessage`, I run it twice: once with a (0, null) output buffer to return the size of the output data, and the second time with an allocated buffer of the required length. It takes just as long in both cases. I can of course over-allocate the buffer and call it only once, but that will shave off about 30% of the time. – Dmitry Streblechenko Jul 02 '22 at 16:38
  • @IInspectable - I would love to use CNG, and that would be one of my questions: what are the CNG functions I need to use? I could not find any code sample that does something similar to what I do with `CryptQueryObject` / `CryptDecryptMessage`. – Dmitry Streblechenko Jul 02 '22 at 16:40
  • You know the size of the blob? Can't you use that to guess the output size? – Anders Jul 03 '22 at 10:42
  • @Anders - I can, but that would be a 1/3 reduction at best: `CryptQueryObject` does not use an output buffer, `CryptDecryptMessage` is called twice (once to find the size of the buffer and second time to actually decrypt), so I can get rid of the first call to `CryptDecryptMessage` - 60 seconds instead of 90 would be better, but still not acceptable. – Dmitry Streblechenko Jul 03 '22 at 17:00

0 Answers0