1

I have a testing utility that uses linux aio_write and aio_read. This testing utility wraps my static library and test it. This library is multi-threaded black box.

Up until now, it worked fine. But now we made a big change into the this black-box which causes the testing utility to fail as soon as it commits the first IO. This IO returns with errno 22 == EINVAL.

Based on the aio_write man pages, this error is issued in case one of the following fields is invalid --> aio_offset, aio_reqprio, aio_nbytes. I run it inside gdb and tested their values as long as with all other values inside the struct aiocb * input parameter. My conclusion is that the input parameters are all valid.

I suspect something has changed in the way threads are working inside the black-box. This is what i suspect causing this issue (i can not find any other explanation).

What I really try to understand is: Which scenarios causes aio_write() to return a EINVAL error code???

Just to clarify, when I replace the black-box to an older version, using the same testing utility it works fine.... (i also tested the input arguments here as saw they are matching to the bad version input arguments).

yanger
  • 227
  • 1
  • 3
  • 14
  • 1
    Try to run your program under [`strace`](http://linux.die.net/man/1/strace). It traces all system calls, so it'll help to understend where things have gone wrong. – myaut Apr 27 '15 at 08:18
  • Please post your solution as answer than and accept it – myaut Apr 27 '15 at 12:28
  • We where able to isolate the problem and fixed it. BUT... we do not understand the reason. The issue was an addition of a static Thread-Local array. Our black box uses this method, defining in several places Thread-Local variables. I fixed it by reducing one size and adding it to the new variable. For example: static __thread new old_[1024]; new code should be: static __thread new old_[512]; // if old_ is array of size 1024 i get the error static __thread new new_[512]; This behavior we can not explain. Can someone clarify this? – yanger Apr 27 '15 at 12:33
  • My partner found and explenation: https://sourceware.org/bugzilla/show_bug.cgi?id=11787 It seems that the aio_write create a thread with small stack size and reduce from it the TLS size. So in our case the TLS size was big, therefore the aio_write thread creation failed. In such case EINVAL value is returned. – yanger Apr 27 '15 at 12:59
  • You have questions, you have answers, you have even your conclusions, you don't need us!!! **You have posted even no lines of code.** How do you think we are going to help you? – Luis Colorado Apr 28 '15 at 04:58
  • I had a question... And later me and my team found a solution, therefore i posted it here as a contribute to the community in case someone else uses large TLS and aio_write and wonder why EINVAL is return as an error code. i do not have a relevant code to show, nor i think one is needed in this case. – yanger Apr 28 '15 at 06:22

1 Answers1

2

You can take a look at the aio implementation in the linux source code at the folder linux-kernel-source/fs/aio.c

Sadly there are a lot of points where -EINVAL is returned. As @myaut mentioned in his comment, I recommend you to use strace. Another solution would be to modify the code, compile it and check where it fails.

Jose Palma
  • 756
  • 6
  • 13