1

I am trying to implement D.J.Bernstein's Poly1305 algorithm. While going through his C implementation here in the poly1305_init function I am not able to figure out what is the arithmetic tactic he has used in this part to achieve performance without timing attack:

void poly1305_init(poly1305_context *ctx, const unsigned char key[32]) {
    poly1305_state_internal_t *st = (poly1305_state_internal_t *)ctx;

    /* r &= 0xffffffc0ffffffc0ffffffc0fffffff */
    st->r[0] = (U8TO32(&key[ 0])     ) & 0x3ffffff;
    st->r[1] = (U8TO32(&key[ 3]) >> 2) & 0x3ffff03;
    st->r[2] = (U8TO32(&key[ 6]) >> 4) & 0x3ffc0ff;
    st->r[3] = (U8TO32(&key[ 9]) >> 6) & 0x3f03fff;
    st->r[4] = (U8TO32(&key[12]) >> 8) & 0x00fffff;
    ......
    ......
}

typedef struct poly1305_state_internal_t {
    unsigned long r[5];
    unsigned long h[5];
    unsigned long pad[4];
    size_t leftover;
    unsigned char buffer[poly1305_block_size];
    unsigned char final;
} poly1305_state_internal_t;

typedef struct poly1305_context {
    size_t aligner;
    unsigned char opaque[136];
} poly1305_context;

I understood all the remaining part of the code in this file. Can anyone help me understand the logic he has used?

Maarten Bodewes
  • 90,524
  • 13
  • 150
  • 263
viji
  • 19
  • 3
  • Note: `poly1305_context` is not in this code nor the referenced code. A _complete_ explanation would need that. – chux - Reinstate Monica Dec 10 '18 at 06:07
  • @chux have added the structure of poly1305_context in the question – viji Dec 10 '18 at 06:25
  • 2
    Those loads each overlap by one effective bit (given the masks), which looks like they're used to do arithmetic on 32-bit hardware without overflow. If you don't have to check overflow, you get fast data-independent operations, and thus resist timing attacks. – Alex Reinking Dec 10 '18 at 08:47

0 Answers0