Bandwidth readings increase while writing large file

Question

We've been measuring the bandwidth of two external HDDs in class using gettimeofday.

Surprisingly enough, and after several repeats (three measurements per execution, executed three times), we found that writing a 2500MB file was faster than writing a smaller one on both HDDs.

Bandwidth chart

This is our C code. It is called from a python script to generate some charts.

//argv[1] = path, argv[2] = size in MB (2500 in this case)

#include <stdio.h>
#include <sys/time.h>
#include <time.h>
#include <unistd.h>
#include <fcntl.h>


struct timeval tv0;
struct timeval tv1;

int main(int argc, char *argv[]){
    unsigned long size=atoi(argv[2])*1000L*1000L;
    int f = open(argv[1], O_CREAT|O_WRONLY|O_TRUNC, 0777);
    char * array = malloc(size);
    gettimeofday(&tv0, 0); //START TIME
    write(f, array, size);
    fdatasync(f);
    close(f);
    gettimeofday(&tv1, 0); // END TIME 
    double seconds = (((double)tv1.tv_sec*1000000.0 + (double)tv1.tv_usec) - ((double)tv0.tv_sec*1000000.0 + (double)tv0.tv_usec))/1000000.0;
    printf("%f",seconds);
}

The teacher didn't know, so I'm asking here: is there a reason why this might happen?

Change the order of tests: 2500Mb vs smaller_one. See if the order makes a difference. — chux - Reinstate Monica, Apr 04 '16 at 19:59

John Bollinger · Accepted Answer · 2016-04-04T19:49:05.450

1

Your benchmark is severely flawed:

It assumes without checking that all of its function calls succeed without errors.
It assumes that on success, write() will have written the full number of bytes specified to it, which it is by no means guaranteed to do.

Either of those could easily invalidate your benchmark results if your assumptions turn out not to be satisfied, and at least the second is fairly likely to turn out that way.

Note in particular that write() returns the number of bytes written, as a ssize_t. ssize_t is a signed integer type whose specific width is system dependent. If yours is 32 bits in size, then write() cannot write all of a 2500MB buffer in a single call, because that's more bytes than a signed 32-bit integer can represent (the limit being a bit over 2100 MB).

Additionally, your program assumes that it can successfully allocate very large blocks of memory, which may easily turn out not to be the case. If this assumption fails, though, you will probably be rewarded with a crash.

edited Apr 04 '16 at 19:49

answered Apr 04 '16 at 19:31

John Bollinger

160,171
8
81
157

Hi John, thanks for your answer. We acknowledge these mistakes and will make some more tests taking that into account. Also, just to clarify, our system is 64 bits and is equipped with 4GB of RAM – agsergi Apr 04 '16 at 20:09
@agsergi, that the OS is 64-bit has no direct bearing on the size of `ssize_t`, and that it has 4GB of RAM does not ensure that a process can allocate a 2.5GB block. – John Bollinger Apr 04 '16 at 20:15
@agsergi, moreover, I observe that if I rescale your results based on the assumption that the presumed 2500MB write in truth wrote only 2147483648 bytes (the maximum representable value for a signed, 32-bit integer) then the anomolous result comes right in line with the other results. Note, too, that the size of the write could easily be limited in practice to such a value regardless of the actual size of `ssize_t`. – John Bollinger Apr 04 '16 at 20:18
1

@John Bollinger The 32-bit truncation of `2500MB` is very telling. Note that `atoi(argv[2])*1000L*1000L` is a problem too. Should be `atoi(argv[2])*1000UL*1000UL` or even better: `size_t size= = (size_t) atoi(argv[2])*1000*1000`. – chux - Reinstate Monica Apr 04 '16 at 20:35
@john-bollinger That makes a lot of sense, indeed. I suppose it would be more responsible to write to the file using a buffer. Thank you very much for the insight. – agsergi Apr 04 '16 at 20:46

Bandwidth readings increase while writing large file

1 Answers1