Creating Large Random Content Files in Python

Question

I am working on characterizing an SSD drive to determine max TBW / life expectancy.

Currently I am using BASH to generate 500MB files with random (non-zero) content :

dd if=<(openssl enc -aes-128-cbc -pass pass:"$(dd if=/dev/urandom bs=128 count=1 2>/dev/null | base64)" -nosalt < /dev/zero) of=/media/m2_adv3d/abc${stamp1} bs=1MB count=500 iflag=fullblock&

Note : {stamp1} is a time stamp for ensuring unique file names.

I am looking to accomplish the same result in Python but am not finding efficient ways to do this (generate the file quickly).

Looking for suggestions.

Thanks!

Update

I have been experimenting with the following and seem to have achieved 2 second write; files appear to be random and different :

import os

newfile = open("testfile.001", "a")
newfile.write (os.urandom(500000000))    # generate 500MB random content file
newfile.close ()

A little skeptical that this is truly good enough to stress an SSD. Basically going to infinitely loop this; once drive is full, deleting to oldest file and writing new one, and collecting SMART data every 500 files written to trend the aging.

Thoughts?

Thanks,

Dan.

Perhaps if you edited the question to show the code you would like speeded up people will suggest improvements. Hard to answer without seeing the existing code. — holdenweb, Feb 28 '19 at 16:22
One thought: since the IO operation is bound to take time, a threaded or asynchronous solution that allows a new random block to be generated while the last one is being written might speed things up. — holdenweb, Mar 01 '19 at 12:42
@holdenweb ; thank you for the suggestions. Tried threading and took a performance hit ... while I seem to be able to consistently write 500MB files at 3 ~ 5 seconds a piece (linear); when I attempt to do two in parallel using threads, I am hitting between 10 ~ 17 seconds ... more towards the 17 seconds. Will post the code for reference and close this one off. Thanks! — Dan G, Mar 05 '19 at 23:17

score 2 · Accepted Answer · answered Jan 15 '20 at 04:51

2

The os.urandom option works best for generating large random files.

answered Jan 15 '20 at 04:51

Dan G

366
1
3
18

score 1 · Answer 2 · answered Feb 28 '19 at 16:49

1

You could try something as easy as this.

import pandas as pd
import numpy as np

rows = 100000
cols = 10000

table_size = [rows,cols]

x = np.ones(table_size)
pd.DataFrame(x).to_csv(path)

You can update the table size to make it larger or smaller. I am not sure if this is more / less efficient than what you are already trying.

answered Feb 28 '19 at 16:49

Austin Wagner

81
5

Trying different approach (see edited question above). – Dan G Feb 28 '19 at 18:31

Creating Large Random Content Files in Python

2 Answers2