2

This question might be a bit weird but I wonder if there is a way NOT to use cache in c++.

I'm doing some tests, in this test I'm loading 2 GB (512*4 MB matrices) to memory, then do some correlations among them and calculate performance.

When I run the code for the 1st run, the running time is t1+x second, in the 2nd run, total time is t2+x seconds where t1 and t2 are loading time of 2 GB matrices and t1 > t2. (approx. t1=20, t2=5 sec). My assumption is it is because in the 2nd run, cache is used. (I don't know if there can be any other reason that decreases loading time like that.)

My problem with this is since there is no standards in loading times, the results are deceptive in some cases. So I want a standard in IO time. The only thing comes to my mind is not to use cache if there is a way.

Is there a way to standardize my IO time?

I'm using Windows 7 x64 and working on visual studio 2010, my RAM is 32 GB.

TEST RESULTS: I've compared average loading times of 4MB binary file in 5 options. The options are 1st run with my original code, 2nd run with original code, using FILE_FLAG_NO_BUFFER, 1st run using cache and 2nd run as Roy Longbottom suggested.

1st run         : 39.1 ms  
2nd run         : 10.4 ms
no_buffer       : 127.8 ms
cache_1st run   : 27.4 ms
cache_2nd run   : 19.2 ms

My original read code is as follows:

void readNoise(string fpath,Mat& data){
    FILE* fp = fopen(fpath.c_str(),"rb");
    if (!fp)perror("fopen");

    float* buffer= new float[size];
    for(int i=0;i<size;++i)   {
        fread(buffer,sizeof(float),size,fp);
        for(int j=0;j<size;++j){
            data.at<float>(i,j)=buffer[j];
        }
    }
    fclose(fp);
    free(buffer);
}

I noticed a mistake in my code which is to do dynamic allocation, when I change dynamic allocation to static allocation, running time of readNoise method becomes same as cache used version of Roy Longbottom.

The difference of two run decreased but the question remains the same: "How to standardize running time of both first and second run"?

smttsp
  • 4,011
  • 3
  • 33
  • 62
  • When you say "cache" do you mean "filesystem cache"? Since you made reference to I/O. – Brian Bi Sep 17 '14 at 20:30
  • Presuming you mean file system cache, yes there may be a way--but all the ways of which I'm aware are non-portable, so you'll need to tell us what OS you're working with before we can (probably) help much. – Jerry Coffin Sep 17 '14 at 20:33
  • I think so, I dont know details of caches. – smttsp Sep 17 '14 at 20:34
  • Sometimes when performance checking I ignore the first run. – Galik Sep 17 '14 at 20:40
  • Also, if you are loading the data into memory why bother even timing the loading time? – Galik Sep 17 '14 at 20:42
  • I think so, I dont know details of caches but I know that in two consecutive runs I got two different times which make me think that it is because of cache(undergraduate knowledge, if hit rate is high io time will decrease) – smttsp Sep 17 '14 at 20:42
  • @Galik, I compare two different methods, so I need both correlation (processing) time and io time – smttsp Sep 17 '14 at 20:43
  • @JerryCoffin, I added the environment details, I'm using VS10 on Windows 7. – smttsp Sep 17 '14 at 20:47
  • 2
    That being the case, you might open the file with `CreateFile`, and specify `FILE_FLAG_NO_BUFFERING` and see if that doesn't at least help get you more consistent results. You can use `_open_osfhandle` and `_fdopen` to at least get a C-style `FILE *` so you can read with something like `fread` instead of using `ReadFile` directly. – Jerry Coffin Sep 17 '14 at 20:58
  • @JerryCoffin, I think second way is something like this: http://stackoverflow.com/a/7369662/2021883 – smttsp Sep 17 '14 at 21:06

2 Answers2

0

Benchmarking, specifically micro-benchmarking is a quite complex scenario and there are many ways you may inadvertently gather false performance data. You should look into micro benchmarking libraries such as google/benchmark and use one of them to perform your tests.

As you can see from your example, external factors such as the file system cache may cause the timing of individual runs to vary greatly.

caskey
  • 12,305
  • 2
  • 26
  • 27
  • Could you explain `you may inadvertently gather false performance data` a bit further? – smttsp Sep 17 '14 at 20:39
  • I added a mention of the i/o cache, but that's not the only thing that can cause problems. Even your disk drive has its own cache. Also different instruction paths, out-of-order execution, speculative execution and so on can be going on inside of the cpu. – caskey Sep 17 '14 at 20:42
  • One thing to think about is lazy loading/instantiating shared libraries. The first run may involve hidden basic initialization costs. EDIT: Also lazy static initializations. – Galik Sep 17 '14 at 20:47
  • I think first run works normal as the IO speed is 100 MB/sec, the second run is 400 MBps which is quite/super high. – smttsp Sep 17 '14 at 20:50
0

Following is the code I use for my Windows drivespeed32 benchmark (free stuff - Google for drivespeed32), followed by results for 2000 MB files via Windows 7 and cached speeds for a smaller file. Code for Linux version also shown.

if (useCache)
{
    hFile = CreateFile(testFile, GENERIC_READ,
         FILE_SHARE_READ | FILE_SHARE_WRITE,
         NULL, OPEN_EXISTING,
         FILE_ATTRIBUTE_NORMAL | FILE_FLAG_SEQUENTIAL_SCAN, NULL);
}
else
{
    hFile = CreateFile(testFile, GENERIC_READ,
         FILE_SHARE_READ | FILE_SHARE_WRITE,
         NULL, OPEN_EXISTING,
         FILE_ATTRIBUTE_NORMAL | FILE_FLAG_SEQUENTIAL_SCAN
                            | FILE_FLAG_NO_BUFFERING, NULL);
}
if (hFile == INVALID_HANDLE_VALUE)
{
    SetCurrentDirectory(currentDir);
    printf (" Cannot open data file for reading\n\n");
    fprintf (outfile, " Cannot open data file for reading\n\n");
    fclose(outfile);
    printf(" Press Enter\n");
    g  = getchar();
    return 0;
}

 Intended for smaller files like 8, 16, 32 MB, so times out after 1 set.

 2000 MB File         1          2          3          4          5
 Writing MB/sec      85.51      85.40      85.64      83.79      83.19
 Reading MB/sec      84.34      85.77      85.60      85.88      85.15

 Running Time Too Long At 246 Seconds - No More File Sizes
 ---------------------------------------------------------------------
 8 MB Cached File      1          2          3          4          5
 Writing MB/sec    1650.43    1432.86    1536.61    1504.16    1481.58
 Reading MB/sec    2225.53    2361.99    2271.81    2235.04    2316.13


Linux Version

if (useCache)
{
      handle = open(testFile, O_RDONLY);
}
else
{
      handle = open(testFile, O_RDONLY | O_DIRECT);
}
if (handle == -1)
{
    printf (" Cannot open data file for reading\n\n");
    fprintf (outfile, " Cannot open data file for reading\n\n");
    fclose(outfile);
    printf(" Press Enter\n");
    g  = getchar();
    return 0;
}
Roy Longbottom
  • 1,192
  • 1
  • 6
  • 8
  • I think, I'm not using the same cache level because in the second run my io speed becomes around 400 MB, whereas the first one is 100 MB. – smttsp Sep 18 '14 at 06:04
  • The first run could leave some of the data in file cache but some would still be read from disk. My cached test uses only 8 MB that remains in cache and results represent memory speed. – Roy Longbottom Sep 18 '14 at 08:37
  • Could you have a look at my test results in the question? I did some tests and added the results – smttsp Sep 18 '14 at 08:38
  • I’m not sure what you are doing but I don’t think that you can read the same file multiple times when changing from buffering to no buffering. When buffered, it might still be in the file cache and an unbuffered read request might read it from there. Next, 4 MB is not large enough to avoid access overheads. At 4 MB, my benchmark indicates between 50 and 90 MB/second (44-80 ms). Also with FFNB, you need to read file in blocks that are multiples of disk sector size. My benchmark uses 1048576 (2048 x 512) - large blocks for best performance. – Roy Longbottom Sep 18 '14 at 14:49