0

I'm running a piece of program on a Linux development board, and I found that when the CPU load is low, it works alright, but if the CPU load gets to a high peak, it will take a much longer time.

Here is how it looks like: There are 2 programs running on board. Comsumer application has multiple threads, and they will call func1 to request some info from the producer process. The producer is a daemon process that will feed info back to the Consumer process.

The sample code looks like this :

Comsumer:

static int send_to_service(msg_t *cmd)
{
    int cnt = send(fd, cmd, sizeof(msg_t), MSG_WAITALL);
    if (cnt != sizeof(msg_t)) {
        log(L_ERROR, "send failed");
        return -1;
    }
    return 0;
}


static int func1(int aaa)
{
    struct_t msg = {...};
    msg.a = ...;
    msg.b = ...;    
    gettimeofday(time1, NULL);
    send_to_service(&msg);

    ...
}

Producer:

while(1) {

    gettimeofday(time2, NULL);
    int ret = select(maxfd+1, &fd_list, NULL, NULL, NULL);  
    gettimeofday(time3, NULL);
    for (int fd = 0; fd <= maxfd; fd ++) {
        if (!FD_ISSET(fd, &fd_list))                                            
           continue;
        ...
    }
}

The time difference between time1 and time3 can be 30ms+ more during CPU high load time. And it does not happen all the time, only once in a while.

I tried the single process way earlier, that way the Consumer calls the driver and get the info directly. That worked well. Now I have to add another process to the system to get the same info, so I have to use a daemon process to feed to both process. The performance is not as good as the single process method. even if there is only one consumer.

The system I use is Linux version 4.14.74, I'm not sure about the socket type and network, both consumer processes are within the same system waiting to get image info. I just used "send, recv and select" system provided.

Johnzy
  • 135
  • 1
  • 11
  • 3
    Why exactly are you surprised that processes work slower when the system is under heavy load? – Ctx Dec 13 '19 at 10:02
  • You already have [the same question](https://stackoverflow.com/questions/59319968/socket-communication-cost-longer-than-expected) posted. Please note that you can edit a question so there is no need to post a new one just to make changes. – kaylum Dec 13 '19 at 10:20
  • @kaylum I must have clicked post by mistake at a certain point, I deleted the other one – Johnzy Dec 13 '19 at 10:56
  • 1
    Not related to your problem, but see [Why is there no flag like MSG_WAITALL for send?](https://stackoverflow.com/q/44240934/10622916) – Bodo Dec 13 '19 at 10:56
  • @Ctx I would like to see if there is a way to optimize it so that the latency can be imporved – Johnzy Dec 13 '19 at 10:57
  • 1
    @Johnzy If this is your question, then [edit] your question and add this. Also add details about the socket type, network connection, OS, ... – Bodo Dec 13 '19 at 11:00
  • I tried the single process way earlier, that way the Consumer calls the driver and get the info itself. That worked well. Now I have to introduce another process to get the same info, so I have to use a daemon process to feed to both process. The performance is not as good as the single process method. even if there is only one consumer. – Johnzy Dec 13 '19 at 11:02
  • @Johnzy You can play around with the scheduler (see `nice(2)`, `sched_setscheduler(2)` and some more commands) to redistribute the priorities amongst the processes. `SCHED_FIFO` with a low priority value for the producer might be able to decrease the latency a bit. However, when the process sleeps in the `select()` syscall, there is no way to guarantee a maximum latency. – Ctx Dec 13 '19 at 11:19
  • @Johnzy Another method could be to set the CPU affinity of the processes in a way, that one cpu core is dedicated to the latency critical process and the other processes run on other cores (see `sched_setaffinity(2)`) – Ctx Dec 13 '19 at 11:20
  • @Bodo, wait...did i use send "MSG_WAITALL" here incorrectly? If so why does it compile just fine? – Johnzy Dec 13 '19 at 12:38
  • @Ctx Is there a way to adjust the sleep time in select? If not, any other function acts like select that can? – Johnzy Dec 13 '19 at 13:02
  • It conpiles just fine because the `flags` argument is just an integer, and so is `MSG_WAITALL`, and the C compiler doesn't have clue one about the semantics of `send()` other than its calling sequence. It may also execute correctly, if inapplicable flags are just ignored. That doesn't make it correct. The point, as per the question in the link, is that `send()` doesn't have a MSG_WAITALL flag because that's what it already does, as defined by Posix. – user207421 Dec 13 '19 at 13:17
  • Related, `send` returns a `ssize_t`, not an `int`. See the [`send(2)` man page](https://linux.die.net/man/2/send). The header file for `ssize_t` is ``. The same applies to most networking functions, like `recv`. – jww Dec 13 '19 at 13:20
  • @Johnzy Not directly. When a sleeping process reaches a wakeup condition, it gets eligible for scheduling. When the kernel then indeed runs the process is only controllable indirectly, for example with the scheduling parameters I mentioned above. – Ctx Dec 13 '19 at 17:00
  • @user207421 Thanks for the explanation, it is clear now. :) – Johnzy Dec 14 '19 at 03:22
  • @Ctx If I can not control the sleep time of select, how about I make select awake all the time, and add a sleep function with a certain amount of time after it. Does it sound like a solution? – Johnzy Dec 14 '19 at 07:39
  • @Johnzy You should [edit] your question and write all clarification, background information, additional questions... there. What type of socket do you use? You seem to have a misunderstanding about `select`. It does not sleep unconditionally. If any file descriptor is ready for a `read` operation, it will return immediately, otherwise it will wait for this condition. Apart from this the execution of the processes depends on the scheduling settings and the system load. – Bodo Dec 16 '19 at 10:29
  • Hey guys, I took the advice from @Ctx and another co-worker and changed the priority of the thread to "sched_setscheduler(2)", also change my socket type to PF_UNIX. Then this problem is pretty much solved. I used the wrong type of socket "AF_INET" earlier. Thanks for all the help :) – Johnzy Dec 16 '19 at 12:43

0 Answers0