3

I have created a shared library where I declare 4K static array data structure (initialized) and share it. Then I access the 4K static array from Program1 for multiple times and each time measure the access time to access the entire 4K shared static array. I got almost same access time ( nearly 17000 ticks).

Now, If I access the same shared static array from another program2, I got almost 1/2 access time than program1. Again after program2 access the shared static array, if I access the same shared static array in Program1 as I did before , I got almost 1/2 access time as original OR almost same access time as Program2.

Can anyone explain me, why it is happening ?

In case 1, before Program2 access shared data structure, program1 access the same shared data structure multiple times. So *why not lower access time for 2nd access * ?

In case 2, why access time in program2 become lower? after program2 access shared data structure , why access time in program1 become lower?

Here is my Shared library :

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdint.h>

static int DATA[1024]={1,2,3,4,.....1024};

inline void foo(void)
{
    int j;        
    int k=0;
    for(j=0;j<1024;j++)
    {
       k=DATA[j];         
    }
    k+=0;
}

Program1 :

    foo(); // do not measure time this time to avoid page fault

    start=rdtsc();
    foo();
    end=rdtsc();
    temp=end-start;
    total_time=temp;

    printf("Inside Program1, Before IPC, Time taken 1 = %llu\n",total_time); 

    start=rdtsc();
    foo();
    end=rdtsc();
    temp=end-start;
    total_time=temp;

    printf("Inside Program1, Before IPC, Time taken 2 = %llu\n",total_time); 



 // USE IPC here , to receive signal

    start=rdtsc();
    foo();
    end=rdtsc();
    temp=end-start;
    total_time=temp;

    printf("Inside Program1, after IPC, Time taken 1 = %llu\n",total_time); 

    // USE IPC here , to send signal
    // USE IPC here , to receive signal

    start=rdtsc();
    foo();
    end=rdtsc();
    temp=end-start;
    total_time=temp;

    printf("Inside Program1, after IPC, Time taken 2 = %llu\n",total_time); 

Program2 :

    foo(); // do not measure time this time to avoid page fault


    // USE IPC here , to send signal 

    start=rdtsc();
    foo();
    end=rdtsc();
    temp=end-start;
    total_time=temp;

    printf("Inside Program2, after IPC, Time taken 1 = %llu\n",total_time);


    start=rdtsc();
    foo();
    end=rdtsc();
    temp=end-start;
    total_time=temp;

    printf("Inside Program2, after IPC, Time taken 2 = %llu\n",total_time);


    // USE IPC here , to receive signal


    start=rdtsc();
    foo();
    end=rdtsc();
    temp=end-start;
    total_time=temp;

   printf("Inside Program2, after IPC, Time taken 3 = %llu\n",total_time);

Output Program1

Inside Program1, Before IPC, Time taken 1 =17200
Inside Program2, Before IPC, Time taken 2 =17200 // Why not lower than previous 
**Inside Program1, after IPC, Time taken 1 = 8504** // why lower ? 
**Inside Program1, after IPC, Time taken 2 = 7489** // why lower ? 

Output Program2

Inside Program2, after IPC, Time taken 1 = 7500
Inside Program2, after IPC, Time taken 2 = 7600
**Inside Program2, after IPC, Time taken 3 = 8500** // Here access time increased as compared to previous 
bholanath
  • 1,699
  • 1
  • 22
  • 40
  • 2
    Is this simply effects of caching (at the chip level)? – Travis Griggs Feb 10 '14 at 19:36
  • Yeah, that's what it sounds like to me too. http://en.wikipedia.org/wiki/Locality_of_reference – Robert Harvey Feb 10 '14 at 19:52
  • @RobertHarvey thanks for answering. If these are locality of reference , then why it is not applying for 2nd access of program1 ( after 1st access of program1 , but before program2 1st access). Locality of references also applies for multiple access of program1 . To my surprise, if I execute ONLY Program1 for 5 times , I am getting almost same 17000 ticks. When I access Program2, then new lower access time for both program 1 and program2 ( almost 7500-8000). How will you explain this - Locality of ? – bholanath Feb 10 '14 at 20:49
  • @TravisGriggs I am also suspecting it is cache effect, but with ONLY program1 it is not reflecting ( locality of reference), whereas after accessing program2 , access time is drastically changed and lower. DON'T know WHY it is not happening while I was accessing same data structure multiple times inside Program1 . – bholanath Feb 10 '14 at 20:53
  • @RobertHarvey, if you see my other post where I am getting same type of result while I use infinite loop Vs sleep without IPC. I am not getting lower access time if I use sleep(100), whereas if I use while(1) I got lower access time. http://stackoverflow.com/questions/21676633/difference-between-use-of-while-and-sleep-to-put-program-into-sleep-mode – bholanath Feb 10 '14 at 20:56
  • @bholanath: Well, the other answerer there has a point: you can't really predict cache locality without having intimate knowledge of what's going on under the hood. – Robert Harvey Feb 10 '14 at 20:57
  • @RobertHarvey I have already disabled other cores, hyper threading, ASLR . – bholanath Feb 10 '14 at 21:22
  • Perhaps you should disable the processor cache as well. :) – Robert Harvey Feb 10 '14 at 21:23
  • You do all realize that each process has their own data, *including initialized and unintialized data in shared libraries*? The code that does the access may very well be the same (except the `inline` makes me suspicious what the original code is). If the data was `const` (and in a read-only section), then all would access the same copy. But there should be no cache effects even then, as only read access is possible. I suspect compiler options or other compile-time effects (address randomization?) is the cause here. – Nominal Animal Feb 10 '14 at 22:23
  • Is the `inline` `foo` code in a .h file or in a .c file? Why isn't `foo` completely optimized out? Nothing is declared `volatile`, and there are no stores to anything visible outside the function. What optimization level are you using? Have you disassembled the binary? – pat Feb 17 '14 at 17:27
  • @pat foo code is in foo.c . disabled all optimization. I didn't get "Have you disassembled the binary?". what it mean? – bholanath Feb 18 '14 at 06:53
  • Normally, an `inline` function would be placed in a .h file. I'm surprised the function even appears in the object file (it doesn't when I compile an `inline` function). The disassembly will show how the compiler actually implemented your function. You can either disassemble the image, or compile with the `-S` option to generate assembly language. – pat Feb 18 '14 at 08:47
  • Another simple reason for this would be that the on-demand CPU governor kicks-in after a while. (or even the "Turbo-Boost" feature on Intel.) – TheCodeArtist Apr 03 '14 at 16:33

0 Answers0