Intel-TSX: Why rtm is failing?

Question

I'm new in using Intel-TSX. So, please correct me on any terminological/conceptional mistake.

I'm trying to write a custom rsa engine using a polarssl library from here (I know it is old, but I found it easy to understand). I have the following code,

result=-1;
unsigned block;

int key_len= 128;

while(result!=1){
    if ((block = _xbegin()) == _XBEGIN_STARTED) {
        if( rsa_pkcs1_decrypt( &rsa_polar, &myrand, NULL, RSA_PRIVATE, &key_len, from, decrypt_plaintext, sizeof(decrypt_plaintext) ) != 0 )
            exit(0);
        rsa_free(&rsa_polar);
        result=1;
        _xend();
    }else{
        printf("RTM 2: Transaction failed\n");
        printf("status is %ld\n", block);
       
    }
    printf("Block 2: Result is %d\n", result);
}

The code inside the rtm block doesn't work. However, the same code works outside rtm block. Upon running the code I'm getting the following output,

.
.
RTM 2: Transaction failed
status is 0
Block 2: Result is -1
. 
.

Any help/suggestions on how to solve it?

Did your OS disable it for TAA side-channel security reasons? See [Are Intel TSX prefixes executed (safely) on AMD as NOP?](https://stackoverflow.com/a/61320559) answer and comments for some links. (I didn't look at your code yet, but if even an empty RTM transaction is aborting, it might be disabled.) — Peter Cordes, Sep 14 '20 at 21:51
I do not think OS disable it. Because I have another rtm block right above this code segment with the same structure but calling different function. That is working fine with occasional failure. But mostly working with the first or second try. — perplex, Sep 14 '20 at 21:55
Ok, that does confirm you have usable RTM enabled on your machine, so it is the details of this transaction that are the problem. I'd worry that the functions you call might be doing too much work. I forget if it's ok to use a `lock`ed instruction inside a transaction; `rsa_free` might do that if it's a thead-safe allocator. — Peter Cordes, Sep 14 '20 at 22:00
What is `rsa_pkcs1_decrypt` doing? Does it do any syscall whatsoever? How long does it run for? Many things can cause a TSX transaction to abort, including context switches, FPU instructions, mixed access to XMM/YMM registers, `cflush`, `cpuid`, and so on. If the function is long and complicated enough, there is a very high chance that at least one of those things is happening. TSX should not be used in such scenarios, but only in critical short sections of code. — Marco Bonelli, Sep 14 '20 at 22:32
According to section 16.3.5 of Volume 1 of the Intel Software Developers Manual, an abort status code of 0 can occur for example when the CPUID instruction is encountered in an RTM region, as it does not satisfy the requirements of setting any of the EAX bits. — Andreas Wenzel, Sep 14 '20 at 22:33
@PeterCordes: If the reason for the abort were "too much work", then I believe bit 3 would have been set in the abort status code. However, all bits are clear. — Andreas Wenzel, Sep 14 '20 at 22:37
Probably unrelated, but if `block` is of type `unsigned` you should printf it with `%u` not `%ld`. — Nate Eldredge, Sep 14 '20 at 22:45
@AndreasWenzel: I guess bit 3 would be set if the "quantity" of work was too much, but it would be cleared if the "quality" was inappropriate: system call, FPU, etc. In particular the companion call to `rsa_free` suggests that `rsa_pkcs1_decrypt` allocates memory or some other resource, which typically can involve a system call. It seems likely that this is indeed "too big" a job for a transaction, and will have to be protected with a mutex instead. — Nate Eldredge, Sep 14 '20 at 22:56
@MarcoBonelli No, 'rsa_pkcs1_decrypt' does not call any syscall. Just check, the total time for the loop is 0.002000 seconds. — perplex, Sep 14 '20 at 23:01
@perplex 2ms is quite a long time, there's a high chance a context switch can happen in that interval. Anyway, the function is too heavy to be ran in a single TSX transaction. It's not what TSX is meant for. — Marco Bonelli, Sep 14 '20 at 23:04
@AndreasWenzel as I said, I'm a novice. I'm gonna ask for that **CPUID** bit. I use **sched_setaffinity** to bind (not sure if it's worth it or not) the code into CPU 1. Then I check the CPUID after the operation fails. I get this _0x00000007 0x00: eax=0x00000000_ — perplex, Sep 14 '20 at 23:10
@MarcoBonelli, thanks for the info. As I'm using a library, I can't change it (Well, I don't want to). Can you please think of any workaround? — perplex, Sep 14 '20 at 23:14
@perplex well... no, not really. If you want to do an entire round of RSA decryption, including memory allocation and deallocation, inside a TSX transaction... that's kind of a lost cause. — Marco Bonelli, Sep 14 '20 at 23:17
@perplex: "the total time for the loop is 0.002000 seconds" -- I believe a lock operation costs about 200 CPU cycles on a modern CPU, which corresponds to 67 nanoseconds for a 3 GHz CPU. The point of lock elision (TSX) is to shorten this time period. However, if your RTM region is, as you say, 2,000,000 nanoseconds long, then there is little point in trying to save these 67 nanoseconds. Nobody will care whether your function runs for 2,000,000 nanoseconds or 2,000,067 nanoseconds. TSX is intended for eliding locks that are held for a much shorter time period. — Andreas Wenzel, Sep 15 '20 at 00:26

Intel-TSX: Why rtm is failing?

0 Answers0