0

I use epoll in edge triggered mode. to avoid starvation the code read MAX_FREAD_LENGTH bytes at once from one socket. later assembles the fragments till EOL occurs. I noticed that the epoll stuck when MAX_FREAD_LENGTH is small. I think it should work for any size of read blocks. It worked well with 512 bytes, but sometimes hangs (means no EPOLLIN event). If I increment the MAX_FREAD_LENGTH it becomes more stable. how can I fix this issue?

Many thanks for considering my question!

EPOLL initialize

int res;
epFd = epoll_create(EPOLL_SIZE);
event.data.fd = serverFd;
event.events = EPOLLIN|EPOLLET;
res=epoll_ctl(epFd, EPOLL_CTL_ADD, serverFd, &event);
if (res == -1){
  perror ("epoll_ctl error: ");
  return EH_ERROR;
}
events = calloc (MAX_EVENTS, sizeof event);

register net event:

while (TRUE){
  int nfds;
  do{
    nfds = epoll_wait(epFd, events, MAX_EVENTS, -1);
  } while (nfds < 0 && errno == EINTR);

  int i = 0;
  for (;i<nfds;i++){
    if ( (events[i].data.fd == serverFd) && (events[i].events & EPOLLIN)){
      if ((clientFd = accept(serverFd,(struct sockaddr *) &clientAddr, &clientLen)) < 0){
        char log[255];
        sprintf(log,"dispatch_net_event: Socket accept failed: %s",strerror(errno));
        logger->log(LOGG_ERROR,log);
      }
      if(newclient(clientFd)!=EH_ERROR){
        /* client created */
        setnonblocking(fd,NONBLOCKING);
        event.data.fd = clientFd;
        event.events = EPOLLIN |EPOLLET;
        if(epoll_ctl(epFd, EPOLL_CTL_ADD, fd, &event)<0){
          fprintf(stderr,"Epoll insertion error (fd=%d): ",clientFd);
          return EH_ERROR;             
        }    
        continue;
      }
      else{
        logger->log(LOGG_ERROR,"Client creation error");
        continue;
      }
    }
    else{
     dispatch_event(events[i].data.fd,NET_EVENT);
    }
  }
}

handle a net event

#define SMTP_MAX_LINE_LENGTH MAX_FREAD_LENGTH
ssize_t count;
char buf[SMTP_MAX_LINE_LENGTH];

memset(buf,'\0', SMTP_MAX_LINE_LENGTH);
count = read (fd, buf,MAX_FREAD_LENGTH );

if (count==-1){
  if (errno == EAGAIN)
    return KEEP_IT;
  else if (errno == EWOULDBLOCK)
    return KEEP_IT;
  else{
    char log[255];
    sprintf(log,"handle_net_event: Read error: %s",strerror(errno));
    logger->log(LOGG_ERROR,log);
  }
}
else{   /* count > 0 there are data in the buffer */
  /* assemble possible partial lines, TRUE if line is whole (end with \r\n)*/
  whole=assemble_line(count,client,&buf[0]);
}
 /* process the line */

EDIT:

I forgot to mention, the epoll run in a separate thread than the other parts

Fabricator
  • 7
  • 1
  • 5
  • 1
    I see where you return `KEEP_IT` but not where you handle that case. Are you forgetting that part? – John Zwinck Mar 22 '13 at 12:06
  • The KEEP_IT return to the central message queue (where the dispatch_event sends the network event), means the connection is live and do not close it. – Fabricator Mar 22 '13 at 12:11
  • I'm not seeing the code path for when you get some data and then need to call read again (since you're using edge-triggered mode). Do you call read more than once per epoll_wait in any case? – John Zwinck Mar 22 '13 at 12:22
  • No. The 'register net event' send a 'NET_EVENT' to the central message queue, the core loop will process the queued events and call the appropriate handler. in this case the handle_net_event() will read from the specified socket MAX_FREAD_LENGTH bytes. So the epoll looping in a separate thread and informs the core program about EPOLLIN events. – Fabricator Mar 22 '13 at 12:31

1 Answers1

2

I think you're using EPOLLET incorrectly. Try without EPOLLET (i.e. using level-triggered mode) and see if that works. If so, it means your fundamental problem is that you are not doing what edge-triggered mode demands, which is to continue to read from any ready descriptor until you get EAGAIN.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • Then I think you should boil down your code to something small that we can try at home. Otherwise it's hard to help much more right now. – John Zwinck Mar 23 '13 at 02:57
  • Thanks, John! Finally you are right, the Edge-Triggered mode was the reason of the problem. – Fabricator Apr 03 '13 at 09:00