0

I'm having and odd problem with a multithreaded program of wich I will report only part of the code. When I try to run it I receive a segmentation fault error. Using gdb and valingrind I was able to find out that the problem is when I try to dereference info, such as in for(i=0; i<info->subm_n; i++). The strangest thing is that if I make the cast info=(c_args*)a after the strncpy(), it goes in segmentation fault only when collector's thread exits. I'm using a 64 bit OS and I've read that this sometimes can make problems while casting to void* in pthread_create(), I don't even know if that's the case. Anyone has any idea? P.S. System calls with capital letters are just redefinition of the functions in witch I also test for fails

typedef struct collector_arguments{
  int subm_n;
  int chronon;
   planet_t *p;
}c_args;


static void* collector(void* a) {
  int fd_skt,fd_sincro,tmp,i=0;
   c_args *info;
   struct sockaddr_un sa;
   info=(c_args*) a;

  strncpy(sa.sun_path,"visual.sck" ,MAXPATH);
  sa.sun_family=AF_UNIX;

  if((fd_sincro=open("SINCRO",O_RDWR))==-1) {
      perror("collector unable to open SINCRO fifo");fflush(stdout);
      pthread_exit(&errno);
   }
  for(i=0; i<info->subm_n; i++) {
    if (read(fd_sincro,&tmp,sizeof(int))==-1){
         perror ("collector Unable to read");fflush(stdout);
         pthread_exit(&errno);
    }
    fd_skt=Socket(AF_UNIX,SOCK_STREAM,0);
    while (connect(fd_skt,(struct sockaddr*)&sa, sizeof(sa)) == -1 ) {
        if ( errno == ENOENT )  sleep(1);
        else {
            perror ("client unable to connect to socket");fflush(stdout);
            pthread_exit (&errno);
        }
    }
    Write(fd_skt,&i,sizeof(int));
    Close(fd_skt);
  }
  Close(fd_sincro);
  pthread_exit((void*) 0);
}




static pthread_mutex_t fifo_mtx = PTHREAD_MUTEX_INITIALIZER;

static void* dispatcher(void* a) {
coordinate *c;
wator_t* w;
int i,j,fifo;
pthread_t tid_collector;

c_args *info=malloc (sizeof(c_args));
w=(wator_t*) a; 
c=(coordinate*) malloc(sizeof(coordinate));
c->numr=2;
c->numc=2;
while ( ((w->plan->nrow / c->numr) * (w->plan->ncol / c->numc))>NWORK_DEF && (w->plan->nrow > 2*c->numr) && (w->plan->ncol > 2*c->numc) ){
    if ( (w->plan->nrow / c->numr) >= (w->plan->ncol / c->numc) )       
        c->numr=c->numr*2;
    else 
        c->numc=c->numc*2;
    }


if ((w->plan->nrow % c->numr)==0) i=(w->plan->nrow / c->numr);
else i=(w->plan->nrow / c->numr)+1;
if ((w->plan->ncol % c->numc)==0) j=(w->plan->ncol / c->numc);
else j=(w->plan->ncol / c->numc)+1;
info->subm_n=i*j;
info->chronon=0;
info->p=w->plan;
while(1){
    reset_updated(w);
    (info->chronon)++;

    Pt_create( &tid_collector, NULL,&collector,(void*) info);

    for(i=0; i< w->plan->nrow; i+=c->numr)
        for(j=0; j< w->plan->ncol; j+=c->numc){     
            if((fifo=open("FIFO",O_WRONLY))==-1){
                perror("dispatcher unable to open FIFO");fflush(stdout);
                pthread_exit(&errno);
                }
            c->rowi=i;
            c->coli=j;
            Write(fifo, c, sizeof(*c));
            Close(fifo);
            }
    i=( (i/c->numr) * (j/c->numc) );
    Pt_join( tid_collector,NULL);
    }
return NULL;
}
Mario The Spoon
  • 4,799
  • 1
  • 24
  • 36
IXion
  • 11
  • 2
  • One thing that I notice at first is that you are using pthread_t tid_collector; and then passing it to every pthread_create. Each time you do this, you destroy that thread identifier and can no longer pthread_join that particular thread, only the latest created thread. You should instead make an array of pthread_t and increment the index passed to pthread_create on each iteration. – Matthew Darnell Jun 30 '15 at 15:23
  • I made an error copying the code, after the join the while ends, in this way only one collector is created and joined for every times the while block is executed, it is not intended to create multiple collectors at the same time. – IXion Jun 30 '15 at 15:28
  • For the code as shown the number of opening/closing braces does not match. Please fix this, as well this "strange" and at least inconsistent indention. – alk Jun 30 '15 at 16:37
  • It seems that some operation corrupts stack. Try to watch, e.g., for the `info` variable during execution under gdb. – Tsyvarev Jun 30 '15 at 16:59
  • @user2676680, he joins the created thread at the end of each iteration of the loop, so he does not need to remember more than one thread identifier at a time. – John Bollinger Jun 30 '15 at 17:34
  • It is suspicious to call `pthread_exit(&errno)` because (1) `errno` may be a macro, and (2) it is thread-local. This probably does not explain your segfault, however. – John Bollinger Jun 30 '15 at 17:37
  • @ John Bollinger, you're right. I didn't review it closely enough. – Matthew Darnell Jun 30 '15 at 17:39
  • Since you dynamically allocate memory for the collector args only once and then re-use that allocated space for multiple threads, and the segfault is tied to access to that structure, I am inclined to guess that something is freeing it prematurely. I don't see anything in the code you presented that would do so, but you have not provided a complete, reproducible example. – John Bollinger Jun 30 '15 at 17:43
  • is w->plan acutally allocated? since it is cast from a void* it is hard to tell... – Mario The Spoon Jun 30 '15 at 17:43
  • Why *do* you dynamically allocate the collector args, anyway? It looks like you could just allocate the structure (itself, not a pointer to it) on the stack as a local variable of function `dispatcher()`. That would help with the memory leak you have if in fact you never free the args structure (and it is not evident in your code that you ever do free it, although I have speculated that in fact you free it prematurely). – John Bollinger Jun 30 '15 at 17:47
  • Allocating c_args on the stack seemed to patch the problem, even if honestly I still can't understand why it doesn't worked the way I did it :/ Now I'm having a broken pipe problem when the collector tries to write on the socket, but il try to fix that on my own ^.^ thanks! – IXion Jul 01 '15 at 22:05
  • You need to reduce this to an MVCE for yourself more than us, but still an MVCE. – kfsone Jul 04 '15 at 15:20

3 Answers3

2

strncpy(sa.sun_path,"visual.sck" ,MAXPATH);

What is MAXPATH?

Don't forget that strncpy() will zero fill up to MAXPATH chars.

On linux sun_path is defined as 108 characters long so if MAXPATH is greater than that (or whatever value is used on your system) then you are in the realms of undefined behaviour which - with this type of error - normally means memory corruption leading eventually to a seg fault:

#define UNIX_PATH_MAX   108

struct sockaddr_un {
    __kernel_sa_family_t sun_family; /* AF_UNIX */
    char sun_path[UNIX_PATH_MAX];   /* pathname */
};
Dipstick
  • 9,854
  • 2
  • 30
  • 30
  • The question is valid, but none of this is a valid *answer*. Also, `strncpy()` does not zero fill anything. – John Bollinger Jun 30 '15 at 17:42
  • 'If the length of src is less than n, strncpy() writes additional null bytes to dest to ensure that a total of n bytes are written.' – Dipstick Jun 30 '15 at 17:43
  • there is a `#define MAXPATH 200` that I've not shown, but I don't get how that could be the problem – IXion Jun 30 '15 at 17:45
  • My bad, `strncpy()` does pad. But this is still not an answer, though it could be a lead on one. – John Bollinger Jun 30 '15 at 17:50
  • 1
    So, an actual answer would involve adding something along the lines of, "You are overwriting the bounds of your `sockaddr_un` structure, and thereby producing undefined behavior. This very well may explain your segfault completely, but it almost certainly explains why behavior differs depending on the point at which you execute the `strncpy()`." – John Bollinger Jun 30 '15 at 17:59
  • Seems like this explained the strange behavior of the `strncpy()` but not fixed the others problems, anyway thank you =D – IXion Jul 01 '15 at 21:54
-1

Not the answer,but what is this code for:

for(i=0; i<info->subm_n; i++) {
    if (read(fd_sincro,&tmp,sizeof(int))==-1){
         perror ("collector Unable to read");fflush(stdout);
         pthread_exit(&errno);
    }

why don'Tyou fseek and then read just one byte? Or read at least 4k (or even better blocksize bytes and then acess accordignly...

Mario The Spoon
  • 4,799
  • 1
  • 24
  • 36
-1

If you think its a memory leak, corrupt stack issue etc. Why not try a memory debugger such as valgrind ( see http://http://valgrind.org/ ).

It will atleast tell you if you are suffering from memory issues, writting beyond the end of a array etc.

Dave
  • 577
  • 6
  • 25
  • If you think this is an answer, why not try reading the help center's section on writing answers (see http://stackoverflow.com/help/how-to-answer ). It will tell you why you are being downvoted. – John Bollinger Jun 30 '15 at 18:03
  • Also, the OP already mentions that he used Valgrind. That's how he localized the problem. – John Bollinger Jun 30 '15 at 18:04
  • D'oh, totaly missed the orginal comment, that what you get for been dyslexia! – Dave Jun 30 '15 at 18:11