1

I use dieHarder tool with ASCII format input files and results are OK but now it`s the right time to use binary files instead. When I had converted my random data to a BIN file like described below no BIAS at all tests is seen. The documentation speaks of raw-binary input format when running on my UBUNTU machine but how this should look like ? My file content is as follows:

(UINT32 as bitstream in file) 0001110000111111000011101110111001111001010000000101110111111011111010011111001011111100100001 ...

call program as: dieharder -g 201 -f <myFile.bin> -a

some sample probes of my input values:

473894638 00011100001111110000111011101110

2034261499 01111001010000000101110111111011

3925015684 11101001111100101111110010000100

...

All p-values will remain at 0.00000 when applying that binary format file.

TyeolRik
  • 466
  • 2
  • 25
Ruby
  • 11
  • 2

1 Answers1

3

I am curious whether how you write .bin file. I guess you wrote binary file in ASCII character. But it is not PROPER input_file_raw that Dieharder test needs. You should write file in Bytes(Binary) not ASCII. This post will be helpful to you or Comment please :)

I had tested several files with MT19937 (Mersenne Twister) and find out PROPER input file.

When you are going to write binary file for Dieharder test, you should keep in mind below 2 things.

  1. Remove Header. (6 lines. From ####... to numbit: 32)
  2. Change Integers from ASCII to Little Endian bytes

Dieharder Test example

Below Data is from MT19937 (32-bits, NOT 64-bits) in Go-language, with seed=0, generating 10,000,000 integers.

(Decimal) ASCII example

#==================================================================
# generator MT19937  seed = 0
#==================================================================
type: d
count: 10000000
numbit: 32
2357136044
2546248239
3071714933
3626093760
...

Binary file example

This is screenshots whether I can see with VSCode-HexEditor

AC 0A 7F 8C 2F AA C4 97 75 A6 16 B7 C0 CC 21 D8
43 B3 4E 9A FB 52 A2 DB C3 76 7D 8B 67 7D E5 D8
09 A4 74 6C D3 DE A1 9F 15 51 59 A5 F2 D6 66 62
24 B7 05 70 57 3A 2B 4C 46 3C 4B E4 D8 BD 84 0E
58 9A B2 F6 8C CD CC 45 3A 39 29 62 C1 42 48 7A
E6 7D AE CA 27 4A EA CF 57 A8 65 87 AE C8 DF 7A
58 5E 6B 91 51 8B 8D 64 A5 E6 F3 EC 19 42 09 D6
...

First data 2357136044 = 0x8C7F0AAC You can see First 4 bytes starts with 'AC' '0A' '7F' '8C'. That shows 2 things, there is no Header and it is Little Endian.

Code in Golang

I know below code is not helpful to you. As far as I know, there is no official Pure-MT19937 generator in Go-language. So, I do porting on my own from Pseudo-code in wiki to Go-language (1.17.1).

littleEndianFile, err := os.Create("./MT19937_LittleEndian.bin")
littleEndianFileBuffer := bufio.NewWriter(littleEndianFile)
littleEndianByte := make([]byte, 4)
// Generate MT19937 on my own.
test := NewMT19937(0)
newInt32 := test.NextUint32()
binary.LittleEndian.PutUint32(littleEndianByte, newInt32)
for _, eachByte := range littleEndianByte {
    littleEndianFileBuffer.WriteByte(eachByte)
}
littleEndianFileBuffer.Flush()

Result - (Decimal) ASCII example

> dieharder -a -g 202 -f ./generated/MT19937_10000000.dat
#=============================================================================#
#            dieharder version 3.31.1 Copyright 2003 Robert G. Brown          #
#=============================================================================#
   rng_name    |           filename             |rands/second|
     file_input|./generated/MT19937_10000000.dat|  7.79e+06  |
#=============================================================================#
        test_name   |ntup| tsamples |psamples|  p-value |Assessment
#=============================================================================#
   diehard_birthdays|   0|       100|     100|0.63638992|  PASSED  
      diehard_operm5|   0|   1000000|     100|0.00012670|   WEAK   
  diehard_rank_32x32|   0|     40000|     100|0.93085433|  PASSED  
    diehard_rank_6x8|   0|    100000|     100|0.07088597|  PASSED  
   diehard_bitstream|   0|   2097152|     100|0.10456387|  PASSED  

Result - Little-Endian example

> dieharder -a -g 201 -f ./generated/MT19937_10000000_LittleEndian.bin
#=============================================================================#
#            dieharder version 3.31.1 Copyright 2003 Robert G. Brown          #
#=============================================================================#
   rng_name    |           filename             |rands/second|
 file_input_raw|./generated/MT19937_10000000_LittleEndian.bin|  5.60e+07  |
#=============================================================================#
        test_name   |ntup| tsamples |psamples|  p-value |Assessment
#=============================================================================#
   diehard_birthdays|   0|       100|     100|0.63638992|  PASSED  
      diehard_operm5|   0|   1000000|     100|0.00012670|   WEAK   
  diehard_rank_32x32|   0|     40000|     100|0.93085433|  PASSED  
    diehard_rank_6x8|   0|    100000|     100|0.07088597|  PASSED  
   diehard_bitstream|   0|   2097152|     100|0.10456387|  PASSED  

You can see above 2 tests (Decimal ASCII and Little-Endian) have same results (P-value)

Result - Big-Endian example

> dieharder -a -g 201 -f ./generated/MT19937_10000000_BigEndian.bin
#=============================================================================#
#            dieharder version 3.31.1 Copyright 2003 Robert G. Brown          #
#=============================================================================#
   rng_name    |           filename             |rands/second|
 file_input_raw|./generated/MT19937_10000000_BigEndian.bin|  5.65e+07  |
#=============================================================================#
        test_name   |ntup| tsamples |psamples|  p-value |Assessment
#=============================================================================#
   diehard_birthdays|   0|       100|     100|0.46325487|  PASSED  
      diehard_operm5|   0|   1000000|     100|0.00000093|  FAILED  
  diehard_rank_32x32|   0|     40000|     100|0.93085433|  PASSED  
    diehard_rank_6x8|   0|    100000|     100|0.27138035|  PASSED  
   diehard_bitstream|   0|   2097152|     100|0.75581067|  PASSED  
        diehard_opso|   0|   2097152|     100|0.25961325|  PASSED  
        diehard_oqso|   0|   2097152|     100|0.00025268|   WEAK   

However you can see that there is some different P-value between above and Big-Endian File. That proves that Dieharder PROPER example should be written in Little-Endian Binary.

Conclusion and Comments

I am afraid that you wrote binaries in ASCII characters. If you can see data with normal text editor like Windows-notepad, that means you wrote in ASCII character and it is UN-PROPER input_file. So, you have to write in Little-Endian Binary instead. This post and test results proved that Little-Endian is right and input_file_raw don't need header.

I am not sure if there is difference between Little-Endian and Big-Endian in "Analyzing test results". In NIST SP800-22, the statistical randomness test is kind of "Counting the number of 0 or 1" or "Checking if there is pattern of '0101', '001100', etc." I think there is no difference in "TRUTH level", which means this generate random or not.

But, I recommend you that writing binaries in Little-Endian. Because we don't know if test builder has profound reason or not.. We just follow the "PROPER" direction for use. :)

TyeolRik
  • 466
  • 2
  • 25
  • I'm curious how did you convert the ASCII file to the Littlle-Endian? Can you share? – Ender May 19 '22 at 04:12
  • @Ender Hello, You could see converted example above. I am so sorry but, I couldn't get what you are confused. Could you explain what you want? – TyeolRik May 20 '22 at 06:28
  • Hi @TyeolRik, I meant if I have the a file that contains the decimal numbers, how do I convert it to the Little-Endian? using the decimal (ASCII) file is so painful with Dieharder. – Ender May 21 '22 at 14:21
  • 1
    As above example, 2357136044 in decimal is **0x8C7F0AAC** in hex (in 32bits integer = 4bytes integer). And in binary file, you can see that it starts with AC 0A 7F 8C, which is Little-Endian of 0x8C7F0AAC. Let's see next digit. We can see 2F AA C4 97 (=0x97C4AA2F) which is 2546248239 in decimal. The thing that you should be aware of is there is no **header** in binary file for Dieharder test file. Just, there is 32 bits intergers. If you cannot understand, please ask me again :) – TyeolRik May 22 '22 at 17:36
  • 1
    And also, testing your PRNGs in Dieharder with binary file is not recommended. (at least, I guess) You should use just output with pipeline (in linux). I am definitely sure that you could get some insight in [my github repository](https://github.com/TyeolRik/RandomTests) There are examples and test results for some PRNGs which are known well. – TyeolRik May 22 '22 at 17:40
  • Hi @TyeolRik, it's super useful. I understood the concept of your findings. I'm struggeling with the conversion between the Integers from ASCII to Little Endian bytes. Eg: I have an ASCII file that contains the list of integers, do you have a tool that can convert those numbers to Little Endian bytes format? – Ender phan Jun 13 '22 at 04:12
  • @Enderphan Well, I think it is easy to convert it. Just delete above 6 lines, which is header. And read line one by one and convert it. Theoretically, there should be no difference whether you write file in LittleEndian or BigEndian. But, I recommend you to write in LittleEndian :) – TyeolRik Jun 14 '22 at 05:07