1

I am working on a testing tool for nvme-cli(written in c and can run on linux).

I am interested in repeating a nvme command 'r' number of times with 't' number of threads.

The below code does the repeat of a command along with threading, but the issue here is the parallel execution time is very much high compared to serial execution.

As per my observation the reason is the invocation of ioctl() system call from err = nvme_identify(fd, 0, 1, data); i.e nvme_identify() inturn calls ioctl().

So can I know if ioctl() is blocking for nvme ?

Also can I have any way (solution) to reduce the execution time by threading?

int repeat_cmd(int fd, void *data, int nsid,int cmd, int rc, int flags, 
struct repeatfields *rf, int threadcount)
{
    pthread_t tid[threadcount];
    int err, i=0,j=0;
    struct my_struct1 my_struct[threadcount];
    switch(cmd){
     case 1 :
                    for (j=0; j <threadcount; j++)
                    {
                            my_struct[j].fd = fd;
                            my_struct[j].data = data;
                            my_struct[j].flags = flags;
                            my_struct[j].rf = *rf;
                            my_struct[j].rcount = rc/threadcount;
                            pthread_create(&tid[j], NULL, ThreadFun_id_ctrl, (void*)&my_struct[i]);
                    }
                      for (j=0; j <threadcount; j++)
                          pthread_join(tid[j], NULL);
         break;
}

The thread function is as follows :

void *ThreadFun_id_ctrl(void *val)
{
    int err,j;
    struct my_struct1 *my_struct = (struct my_struct1 *)val;
    int fd = my_struct->fd;
    void *data = my_struct->data;
    struct repeatfields rf = my_struct->rf;
    int flags = my_struct->flags;
    int rcount = my_struct->rcount;
    printf("Printing count = %d\n",rcount);

    for (j=0; j <rcount; j++)
    {
            err = nvme_identify(fd, 0, 1, data);
    if (!err) {
                    if (rf.fmt == BINARY)
                            d_raw((unsigned char *)&rf.ctrl, sizeof(rf.ctrl));
                    else if (rf.fmt == JSON)
                            json_nvme_id_ctrl(data, flags, 0);
                    else {
                            printf("NVME Identify Controller:\n");
                            __show_nvme_id_ctrl(data, flags, 0);
                    }
            }
            else if (err > 0){
                    fprintf(stderr, "NVMe Status:%s(%x)\n",
                    nvme_status_to_string(err), err);
                    }
            else
                    perror("identify controller");
            printf("Printing from Thread id = %d\n",syscall(SYS_gettid));
    }
    return NULL;
Arjun G S
  • 23
  • 13
  • 1
    You don't want that `pthread_join` in the loop. That waits for each thread to end before starting the next. You're just paying the thread-creation overhead for nothing. Do the joins in a second `for` loop after launching the worker threads. – lockcmpxchg8b Dec 05 '17 at 04:54
  • okay thank you.., but yet the execution time is greater than the serial execution. – Arjun G S Dec 05 '17 at 06:25
  • @lockcmpxchg8b your suggestion has reduced the extra delays. thank you. But my issue(parallel execution time is high compared to serial execution.) is not yet solved. – Arjun G S Dec 05 '17 at 06:29
  • When you say 'serial', you mean if you take this exact code, but replace the `pthread_create` call with a call to `ThreadFun_id_ctrl` directly (and comment out the `pthread_join` calls), that takes longer than this current threaded version? (or is the serial version significantly different? I'm trying to make sure that the serial and threaded version make the same number of calls to nvme_***) – lockcmpxchg8b Dec 05 '17 at 06:46
  • `@lockcmpxchg8b` In serial, I don't need `my_struct` and its assignment statements and instead of `ThreadFun_id_ctrl()` we have only the contents of the for loop in `ThreadFun_id_ctrl()` i.e. from `err = nvme_identify(fd, 0, 1, data);` – Arjun G S Dec 05 '17 at 11:22
  • @ArjunGS You have to look at it from our point of view. We don't know how your serial code looks like. The reasons for your two versions having different delays could be something else entirely than using threads vs not using threads. So if you need people to figure out why one variant of your code behaves differently than another variant, you need to post the code for both. – nos Dec 21 '17 at 08:13
  • @nos I have ended up in a different problem altogether while looking a solution for this problem. If possible please look@ the below question. https://stackoverflow.com/questions/47918060/is-there-any-way-for-ioctl-in-linux-to-specify-submission-queue-id-for-a-nvme – Arjun G S Dec 21 '17 at 09:22

0 Answers0