Fastest bitwise xor between two multibyte binary data variables

Question

What is the fastest way to implementat the following logic:

def xor(data, key):
    l = len(key)

    buff = ""
    for i in range(0, len(data)):
        buff += chr(ord(data[i]) ^ ord(key[i % l]))
    return buff

In my case key is 20-byte sha1 digest, and data is some binary data between 20 bytes and few (1, 2, 3) megabytes long

UPDATE:

OK guys. Here's a 3.5 times faster implementation, which splits data and key by chunks of 4, 2 or 1 bytes (in my case, most of the time it's 4-byte long integer):

def xor(data, key):
    index = len(data) % 4
    size = (4, 1, 2, 1)[index]
    type = ('L', 'B', 'H', 'B')[index]
    key_len = len(key)/size
    data_len = len(data)/size
    key_fmt = "<" + str(key_len) + type;
    data_fmt = "<" + str(data_len) + type;

    key_list = struct.unpack(key_fmt, key)
    data_list = struct.unpack(data_fmt, data)

    result = []
    for i in range(data_len):
        result.append (key_list[i % key_len] ^ data_list[i])

    return struct.pack(data_fmt, *result)

Uses a lot of memory, but in my case it's not a big deal.

Any ideas how to increase the speed few more times? :-)

FINAL UPDATE:

OK, ok... numpy did the job. That's just blazing fast:

def xor(data, key):
    import numpy, math

    # key multiplication in order to match the data length
    key = (key*int(math.ceil(float(len(data))/float(len(key)))))[:len(data)]

    # Select the type size in bytes       
    for i in (8,4,2,1):
        if not len(data) % i: break

    if i == 8: dt = numpy.dtype('<Q8');
    elif i == 4: dt = numpy.dtype('<L4');
    elif i == 2: dt = numpy.dtype('<H2');
    else: dt = numpy.dtype('B');

    return numpy.bitwise_xor(numpy.fromstring(key, dtype=dt), numpy.fromstring(data, dtype=dt)).tostring()

Initial implementation needed 8min 50sec to process a gigabyte, the second - around 2min 30sec and the last one just.... 0min 10sec.

Thanks to anyone who contributed ideas and code. You're great guys!

The "fastest"? Well, the way with the lowest runtime speed overhead is a C (or Cython, for the weak of heart) extension. — , Apr 20 '11 at 18:12
Create a dict of `key:ord(key)` + `val:ord(val)` (from `set(key) | set(data)`) to save many ord calls? Then use a list comprehension instead of string concatenation? — TryPyPy, Apr 20 '11 at 18:55
There's quite a detailed look at this problem here: http://stackoverflow.com/questions/2119761/simple-python-challenge-fastest-bitwise-xor-on-data-buffers — Scott Griffiths, Apr 20 '11 at 21:08
The one I just posted takes about 42% of the time of your current fastest one when xoring a 16M string, and substantially less memory. It also does not rely on numpy or inline assembly. — Omnifarious, Apr 21 '11 at 02:19
Also, I have no clue what you intend to do with this, but this is a completely horrible method of doing encryption. — Omnifarious, Apr 21 '11 at 02:23
I added a new version of my code that's much shorter and easier to understand. And yes, it really is a lot faster than the version you currently have as the fastest. Doesn't seem terribly intuitive, I know. — Omnifarious, Apr 21 '11 at 05:47
Re: Your final (& prior) update -- you could generally speed these up by *always* processing as much data as possible using the largest chunk size (4 or 8 bytes) and handling any remainder using an appropriate smaller size (1 or 2 bytes). The later being such a small amount, it might not be worth optimizing. — martineau, Apr 22 '11 at 20:50
@Nikolai Gorchilov I tried your code but getting a datatype error regarding ' — Amir Afianian, Dec 31 '15 at 09:01
Your padding algorithm for extending the key is probably what is adding 9 seconds to your computation. I would optimize the code above as follows: `key = np.pad(key, (0, len(data) - len(key)), 'wrap')` — Nadeem Douba, Nov 08 '19 at 16:58

score 1 · Answer 1 · answered May 05 '17 at 12:00

Disclaimer:As other posters have said, this is a really bad way to encrypt files. This article demonstrates how to reverse this kind of obfuscation trivially.

first, a simple xor algorithm:

def xor(a,b,_xor8k=lambda a,b:struct.pack("!1000Q",*map(operator.xor,
                    struct.unpack("!1000Q",a),
                    struct.unpack("!1000Q",b)))
        ):
    if len(a)<=8000:
        s="!%iQ%iB"%divmod(len(a),8)
        return struct.pack(s,*map(operator.xor,
            struct.unpack(s,a),
            struct.unpack(s,b)))
    a=bytearray(a)
    for i in range(8000,len(a),8000):
        a[i-8000:i]=_xor8k(
            a[i-8000:i],
            b[i-8000:i])
    a[i:]=xor(a[i:],b[i:])
    return str(a)

secondly the wrapping xor algorithm:

def xor_wrap(data,key,_struct8k=struct.Struct("!1000Q")):
    l=len(key)
    if len(data)>=8000:
        keyrpt=key*((7999+2*l)//l)#this buffer is accessed with whatever offset is required for a given 8k block
        #this expression should create at most 1 more copy of the key than is needed
        data=bytearray(data)
        offset=-8000#initial offset, set to zero on first loop iteration
        modulo=0#offset used to access the repeated key
        for offset in range(0,len(data)-7999,8000):
            _struct8k.pack_into(data,offset,*map(operator.xor,
                _struct8k.unpack_from(data,offset),
                _struct8k.unpack_from(keyrpt,modulo)))
            modulo+=8000;modulo%=l
        offset+=8000
    else:offset=0;keyrpt=key*(len(data)//l+1)#simple calculation guaranteed to be enough
    rest=len(data)-offset
    srest=struct.Struct("!%iQ%iB"%divmod(len(data)-offset,8))
    srest.pack_into(data,offset,*map(operator.xor,
        srest.unpack_from(data,offset),
        srest.unpack_from(keyrpt,modulo)))
    return data

nmichaels · Answer 2 · 2011-04-20T21:26:08.277

If len(data) is large, you might see a significant improvement from xrange. Actually, you can replace the range function entirely with enumerate. You might also benefit from using a list instead of appending to a string.

def xor(data, key):
    l = len(key)
    buff = []
    for idx, val in enumerate(data):
        buff.append(chr(ord(val) ^ ord(key[idx % l]))
    return ''.join(buff)

I haven't timed it, but off the top of my head I'd expect that to be a bit faster for large amounts of data. Make sure you measure every change.

If profiling suggests that the call to ord() actually takes time, you can run it on all the values in key ahead of time to save a call in the loop.

You could also turn that for loop into a plain old list comprehension, but it will negatively impact readability. Regardless, try it and see if it's way faster.

No tangible gain with your implementation. But calculating `ord()` for key in advance speeds it up by 10% — Nikolai Gorchilov, Apr 20 '11 at 22:17
@Nikolai: Well that hardly seems worth it. How about the list comprehension? `''.join([chr(ord(val) ^ ord_key[idx % 1] for idx, val in enumerate(data)])` — nmichaels, Apr 20 '11 at 22:25

score 1 · Answer 3 · answered Apr 20 '11 at 18:58

Not tested

Don't know if it's faster

supposing that len(mystring) is a multiple of 4

def xor(hash,mystring):
    s = struct.Struct("<L")

    v1 = memoryview(hash)

    tab1 = []
    for i in range(5):
        tab1.append(s.unpack_from(v1,i*4)

    v2 = memoryview(mystring)
    tab2=[]
    for i in range(len(mystring)/4):
        tab2.append(s.unpack_from(v1,i*4))
    tab3 = []
    try:
        for i in range(len(mystring)/20):
            for j in range(5):
               tab3.append(s.pack(tab1[j]^tab2[5*i+j]))
    expect IndexError:
        pass
    return "".join(tab3)

20-30% increase in speed. Nice result, but I need much more :) — Nikolai Gorchilov, Apr 20 '11 at 22:19

Omnifarious · Answer 4 · 2011-04-21T13:55:26.863

This code should work in Python 2.6+ including Py3k.

from binascii import hexlify as _hexlify
from binascii import unhexlify as _unhexlify


def packl(lnum, padmultiple=0):
    """Packs the lnum (which must be convertable to a long) into a
    byte string 0 padded to a multiple of padmultiple bytes in size. 0
    means no padding whatsoever, so that packing 0 result in an empty
    string.  The resulting byte string is the big-endian two's
    complement representation of the passed in long."""

    if lnum == 0:
        return b'\0' * padmultiple
    elif lnum < 0:
        raise ValueError("Can only convert non-negative numbers.")
    s = hex(lnum)[2:]
    s = s.rstrip('L')
    if len(s) & 1:
        s = '0' + s
    s = _unhexlify(s)
    if (padmultiple != 1) and (padmultiple != 0):
        filled_so_far = len(s) % padmultiple
        if filled_so_far != 0:
            s = b'\0' * (padmultiple - filled_so_far) + s
    return s

def unpackl(bytestr):
    """Treats a byte string as a sequence of base 256 digits
    representing an unsigned integer in big-endian format and converts
    that representation into a Python integer."""

    return int(_hexlify(bytestr), 16) if len(bytestr) > 0 else 0

def xor(data, key):
    dlen = len(data)
    klen = len(key)
    if dlen > klen:
        key = key * ((dlen + klen - 1) // klen)
    key = key[:dlen]
    result = packl(unpackl(data) ^ unpackl(key))
    if len(result) < dlen:
         result = b'\0' * (dlen - len(result)) + result
    return result

This will also work in Python 2.7 and 3.x. It has the advantage of being a lot simpler than the previous one while doing basically the same thing in approximately the same amount of time:

from binascii import hexlify as _hexlify
from binascii import unhexlify as _unhexlify

def xor(data, key):
    dlen = len(data)
    klen = len(key)
    if dlen > klen:
        key = key * ((dlen + klen - 1) // klen)
    key = key[:dlen]
    data = int(_hexlify(data), 16)
    key = int(_hexlify(key), 16)
    result = (data ^ key) | (1 << (dlen * 8 + 7))
    # Python 2.6/2.7 only lines (comment out in Python 3.x)
    result = memoryview(hex(result))
    result = (result[4:-1] if result[-1] == 'L' else result[4:])
    # Python 3.x line
    #result = memoryview(hex(result).encode('ascii'))[4:]
    result = _unhexlify(result)
    return result

Seems like memoryview is available in python 2.7+ only. Mine is 2.6.5 — Nikolai Gorchilov, Apr 21 '11 at 06:54
@Nikolai Gorchilov: Oh, oops. :-) You can replace `memoryview` with `buffer` in Python 2.6. Alternatively, the first version that has `packl` and `unpackl` doesn't use `memoryview`. — Omnifarious, Apr 21 '11 at 13:56

score 0 · Answer 5 · answered Apr 26 '11 at 21:02

Here's a version that only uses Python built-in and standard modules which seems very fast -- although I haven't compared it to your numpy version. It uses a couple of optimized conversion functions from the Python Cryptography Toolkit as indicated.

# Part of the Python Cryptography Toolkit
# found here:
# http://www.google.com/codesearch/p?hl=en#Y_gnTlD6ECg/trunk/src/gdata/Crypto/Util/number.py&q=lang:python%20%22def%20long_to_bytes%22&sa=N&cd=1&ct=rc

# Improved conversion functions contributed by Barry Warsaw, after
# careful benchmarking

import struct

def long_to_bytes(n, blocksize=0):
    """long_to_bytes(n:long, blocksize:int) : string
    Convert a long integer to a byte string.

    If optional blocksize is given and greater than zero, pad the front of the
    byte string with binary zeros so that the length is a multiple of
    blocksize.
    """
    # after much testing, this algorithm was deemed to be the fastest
    s = ''
    n = long(n)
    pack = struct.pack
    while n > 0:
        s = pack('>I', n & 0xffffffffL) + s
        n = n >> 32
    # strip off leading zeros
    for i in range(len(s)):
        if s[i] != '\000':
            break
    else:
        # only happens when n == 0
        s = '\000'
        i = 0
    s = s[i:]
    # add back some pad bytes.  this could be done more efficiently w.r.t. the
    # de-padding being done above, but sigh...
    if blocksize > 0 and len(s) % blocksize:
        s = (blocksize - len(s) % blocksize) * '\000' + s
    return s

def bytes_to_long(s):
    """bytes_to_long(string) : long
    Convert a byte string to a long integer.

    This is (essentially) the inverse of long_to_bytes().
    """
    acc = 0L
    unpack = struct.unpack
    length = len(s)
    if length % 4:
        extra = (4 - length % 4)
        s = '\000' * extra + s
        length = length + extra
    for i in range(0, length, 4):
        acc = (acc << 32) + unpack('>I', s[i:i+4])[0]
    return acc


# original code in SO question
def xor_orig(data, key):
    l = len(key)

    buff = ""
    for i in range(0, len(data)):
        buff += chr(ord(data[i]) ^ ord(key[i % l]))
    return buff

# faster pure python version
def xor_new(data, key):
    import math

    # key multiplication in order to match the data length
    key = (key*int( math.ceil(float(len(data))/float(len(key)))))[:len(data)]

    # convert key and data to long integers
    key_as_long = bytes_to_long(key)
    data_as_long = bytes_to_long(data)

    # xor the numbers together and convert the result back to a byte string
    return long_to_bytes(data_as_long ^ key_as_long)

if __name__=='__main__':
    import random
    import sha

    TEST_DATA_LEN = 100000

    data = ''.join(chr(random.randint(0, 255)) for i in xrange(TEST_DATA_LEN))
    key = sha.new(data).digest()

    assert xor_new(data, key) == xor_orig(data, key)
    print 'done'

score 0 · Answer 6 · answered Nov 08 '19 at 17:06

Following on my comment in the initial post, you can process large files rather quickly if you stick to numpy for key padding and bitwise XOR'ing, like so:

import numpy as np

# ...

def xor(key, data):

    data = np.fromstring(data, dtype=np.byte)
    key = np.fromstring(key, dtype=np.byte)

    # Pad the key to match the data length
    key = np.pad(key, (0, len(data) - len(key)), 'wrap')

    return np.bitwise_xor(key, data)

score -1 · Answer 7 · answered Apr 20 '11 at 18:13

-1

What you have is already as fast as you can get in Python.

If you really need it faster, implement it in C.

answered Apr 20 '11 at 18:13

orlp

112,504
36
218
315

1

I don't think so, I think it could be faster if the xoring thing would be on a long number of 20 bytes long – Xavier Combelle Apr 20 '11 at 18:21
Unfortunately I'm very unqualified in C :( – Nikolai Gorchilov Apr 20 '11 at 18:22
@xavier: how to implement this? – Nikolai Gorchilov Apr 20 '11 at 18:23

Fastest bitwise xor between two multibyte binary data variables

7 Answers7

Linked