-1

Today i have started a project having the goal to optimize the generation of random numbers. I want to wipe several hard drives, using the Mersenne Twister PRNG, but unfortunately i'm only able to produce around 200MB/s of random data, on 8 hard drives, so 25MB/s each. Is there a way to optimize this code, using AVX, or SSE (for legacy reasons) to improve this code? Is it something what can be done without having studied computer science? Unfortunately i'm a simple C programmer, only having the experience of a few years, but not having studied computer science yet.

Could someone provide some information, how to bring this forward? Which of these processes could be improved? Can someone give some examples, how an enhanced version of it could look like? Can someone provide some good books to retrieve a better knowledge about this matter?

    /* 
   A C-program for MT19937, with initialization improved 2002/1/26.
   Coded by Takuji Nishimura and Makoto Matsumoto.
   Before using, initialize the state by using init_genrand(seed)  
   or init_by_array(init_key, key_length).
   Copyright (C) 1997 - 2002, Makoto Matsumoto and Takuji Nishimura,
   All rights reserved.                          
   Copyright (C) 2005, Mutsuo Saito,
   All rights reserved.                          
   Redistribution and use in source and binary forms, with or without
   modification, are permitted provided that the following conditions
   are met:
     1. Redistributions of source code must retain the above copyright
        notice, this list of conditions and the following disclaimer.
     2. Redistributions in binary form must reproduce the above copyright
        notice, this list of conditions and the following disclaimer in the
        documentation and/or other materials provided with the distribution.
     3. The names of its contributors may not be used to endorse or promote 
        products derived from this software without specific prior written 
        permission.
   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
   A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
   CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
   EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
   PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
   PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
   LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
   NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
   SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
   Any feedback is very welcome.
   http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html
   email: m-mat @ math.sci.hiroshima-u.ac.jp (remove space)
*/

#include <stdio.h>
#include "mt19937ar.h"

/* Period parameters */  
#define N 624
#define M 397
#define MATRIX_A 0x9908b0dfUL   /* constant vector a */
#define UPPER_MASK 0x80000000UL /* most significant w-r bits */
#define LOWER_MASK 0x7fffffffUL /* least significant r bits */

static unsigned long mt[N]; /* the array for the state vector  */
static int mti=N+1; /* mti==N+1 means mt[N] is not initialized */

/* initializes mt[N] with a seed */
void init_genrand(unsigned long s)
{
    mt[0]= s & 0xffffffffUL;
    for (mti=1; mti<N; mti++) {
        mt[mti] = 
        (1812433253UL * (mt[mti-1] ^ (mt[mti-1] >> 30)) + mti); 
        /* See Knuth TAOCP Vol2. 3rd Ed. P.106 for multiplier. */
        /* In the previous versions, MSBs of the seed affect   */
        /* only MSBs of the array mt[].                        */
        /* 2002/01/09 modified by Makoto Matsumoto             */
        mt[mti] &= 0xffffffffUL;
        /* for >32 bit machines */
    }
}

/* initialize by an array with array-length */
/* init_key is the array for initializing keys */
/* key_length is its length */
/* slight change for C++, 2004/2/26 */
void init_by_array(unsigned long init_key[], int key_length)
{
    int i, j, k;
    init_genrand(19650218UL);
    i=1; j=0;
    k = (N>key_length ? N : key_length);
    for (; k; k--) {
        mt[i] = (mt[i] ^ ((mt[i-1] ^ (mt[i-1] >> 30)) * 1664525UL))
          + init_key[j] + j; /* non linear */
        mt[i] &= 0xffffffffUL; /* for WORDSIZE > 32 machines */
        i++; j++;
        if (i>=N) { mt[0] = mt[N-1]; i=1; }
        if (j>=key_length) j=0;
    }
    for (k=N-1; k; k--) {
        mt[i] = (mt[i] ^ ((mt[i-1] ^ (mt[i-1] >> 30)) * 1566083941UL))
          - i; /* non linear */
        mt[i] &= 0xffffffffUL; /* for WORDSIZE > 32 machines */
        i++;
        if (i>=N) { mt[0] = mt[N-1]; i=1; }
    }

    mt[0] = 0x80000000UL; /* MSB is 1; assuring non-zero initial array */ 
}

/* generates a random number on [0,0xffffffff]-interval */
unsigned long genrand_int32(void)
{
    unsigned long y;
    static unsigned long mag01[2]={0x0UL, MATRIX_A};
    /* mag01[x] = x * MATRIX_A  for x=0,1 */

    if (mti >= N) { /* generate N words at one time */
        int kk;

        if (mti == N+1)   /* if init_genrand() has not been called, */
            init_genrand(5489UL); /* a default initial seed is used */

        for (kk=0;kk<N-M;kk++) {
            y = (mt[kk]&UPPER_MASK)|(mt[kk+1]&LOWER_MASK);
            mt[kk] = mt[kk+M] ^ (y >> 1) ^ mag01[y & 0x1UL];
        }
        for (;kk<N-1;kk++) {
            y = (mt[kk]&UPPER_MASK)|(mt[kk+1]&LOWER_MASK);
            mt[kk] = mt[kk+(M-N)] ^ (y >> 1) ^ mag01[y & 0x1UL];
        }
        y = (mt[N-1]&UPPER_MASK)|(mt[0]&LOWER_MASK);
        mt[N-1] = mt[M-1] ^ (y >> 1) ^ mag01[y & 0x1UL];

        mti = 0;
    }

    y = mt[mti++];

    /* Tempering */
    y ^= (y >> 11);
    y ^= (y << 7) & 0x9d2c5680UL;
    y ^= (y << 15) & 0xefc60000UL;
    y ^= (y >> 18);

    return y;
}
  • 3
    Overall, this is too broad a question. However, some observations. First, it seems like overkill to use a Mersenne Twister to generate data for wiping hard drives. The ability to recover data from erased drives was something perhaps possible with special equipment on old drives, but I doubt it is feasible with modern drives. (One would have to check.) Even if it is possible, wiping the drive with all zeros, then all ones, then one or a few simple patterns should suffice to obliterate traces of previous data. How would using high-quality pseudo-random data help? – Eric Postpischil Feb 06 '20 at 13:04
  • 1
    Actually there are already utilities approved for this purpose. Things like [GNU Shred](https://www.gnu.org/software/coreutils/manual/html_node/shred-invocation.html) – Mgetz Feb 06 '20 at 13:06
  • 1
    Second, when the output format is unimportant (bits can be in any order; they do not need to conform to a protocol), the best way to use SIMD instructions may be to rearrange the algorithm to work in parallel bitwise. For example, with 512-bit SIMD, implement the algorithm as 512 independent computations, each one implemented with primitive bitwise AND, OR, XOR, and NOT operations. With this, shift operations are free. However, it may be quite tedious. – Eric Postpischil Feb 06 '20 at 13:08
  • Third, a less ambitious approach is to implement the algorithm with more general SIMD instructions. But even that requires a considerably amount of explanation and documentation. Stack Overflow is not the place to teach somebody to use AVX or SSE. – Eric Postpischil Feb 06 '20 at 13:08
  • If you have a great need to destroy hard drives securely, physical destruction may be a better, cheaper, and faster approach. Searching the web for “hard drive destruction” reveals a number of services for that, as well as [WikiHow instructions](https://www.wikihow.com/Destroy-a-Hard-Drive). – Eric Postpischil Feb 06 '20 at 13:10
  • I think your whole approach is misguided due to reasons given by others, but if you're looking for speedy PRNGs, consider some of the options available at http://prng.di.unimi.it rather than trying to invent your own or tweak prior work. Even [brilliant mathematicians such as von Neumann have found this task daunting](https://en.wikipedia.org/wiki/John_von_Neumann#Computing). – pjs Feb 06 '20 at 20:34
  • You can use the same random data for all 8 hard drives in parallel, e.g. `tee` it onto all 8 drives instead of distributing it. And yes, as @EricPostpischil says, MT is a ridiculously slow PRNG. You don't need anywhere near that quality of randomness. SIMD xorshift+ would be 100% fine, and that's *very* cheap to run multiple generators in parallel in a SIMD vector. e.g. see [What's the fastest way to generate a 1 GB text file containing random digits?](//unix.stackexchange.com/a/324520) (which takes random binary vectors and processes them into ASCII text, still at ~13 GB/s) – Peter Cordes Feb 07 '20 at 05:40
  • @EricPostpischil: And yes, data recovery from *modern* magnetic hard drives is a myth. Overwriting once with zeros is fine for most people, overwriting with a random pattern once is more than fine. [How can I reliably erase all information on a hard drive?](//security.stackexchange.com/q/5749) has some links. The NIST-approved method still involves multiple overwrites. Also, drive firmwares know how to erase themselves with an ATA secure-erase command, so you don't have to generate the randomness. And that works on SSDs, where the flash remapping layer can defeat this method. – Peter Cordes Feb 07 '20 at 05:44

1 Answers1

1

I think, Mersenne Twister PRNG is not suitable for your purpose, because of it is not cryptographically secure. If cryptographer can guess or recover piece of MT sequence (for example, from initially 0-filled disk part), he can recover your generator state, and recover entire your sequence, by re-run PRNG from that state.

I suggest you to use modification of RC4 PRNG, but works with array of "uint16_t [1 << 16]", rather than original byte-oriented RC4 with array "uint8_t [1 << 8]". This modification will give you 2 benefits:

  • For each computation step, you will extract 2 bytes, not one, as in the original. By other words, you will get ~2x performance improvement, this is important for you.
  • Period of this generator would be extremely long, I estimate it as 2^(2^14).

To preserve attack to your KSA (major RC4 vulnerability), I suggest you to "empty intial run" 2^18 times before usage. By other words, you initially, just read and drop initial 2^18 uint16_t values from PRNG.

Also, you can modify the KSA, to make it more hack-resistant.

olegarch
  • 3,670
  • 1
  • 20
  • 19