5

I am trying to understand how non-blocking network IO is working in Node.js/libuv. I already found out that file IO is done using libuv worker threads (thus, in a background thread). However it is stated in various places that network IO is done in a non-blocking fashion using system calls like epoll, kqueue, etc (depending on operating system).

Now I am wondering if this means that the actual IO part (read()) is still done on the mainthread, and thus blocking, even if e. g. epoll is used? As for my understanding, epoll only notifies about available events, but does not actually do the read/write. At least in the examples I found (e. g. http://davmac.org/davpage/linux/async-io.html) epoll is always used in combination with the read system call, which is a blocking IO operation.

In other words, if libuv is using a single thread and epoll, to have a notification when data is available to read, is the then following read operation beeing executed on the mainthread and thus potentially blocking other operations (thinking of network requests) on the mainthread?

user826955
  • 3,137
  • 2
  • 30
  • 71
  • Actually epoll have two mode, and generally we use the mode that only trigger the user once when data arrive, so we need to put socket in non blocking mode. – Stargateur May 29 '18 at 07:51
  • Socket in non-blocking mode means that `read` syscall will not block when no data is available. So the actual IO is still not done somewhere in the background? – user826955 May 29 '18 at 08:06
  • epoll/poll/select are all used to do IO operation on a fd that is ready to do the operation without blocking. with poll/epoll/select you can do asynchronous operation on a single thread with blocking operations. – Tyker May 29 '18 at 08:33
  • @user826955 "So the actual IO is still not done somewhere in the background?" what do you mean ? read is the actual IO. – Stargateur May 29 '18 at 09:10
  • Ok as I tried to explain above, my understanding (which might be wrong) is that the return value of `epoll` will indicate whether or not data is available, and a subsequent `read()` will actually read the bytes from the socket. Now related to `libuv`/`Node.js`, how is this asynchronous if its not done in a separate thread? – user826955 May 29 '18 at 09:35
  • 1
    @user826955 there is not any real asynchronous system in the world. all is synchrone. But you can transform a synchrone system to something that "look like" asynchronous. This is the purpose of many library. your question is too broad. one way to do asynchronous is to use epoll, one other is to use thread. – Stargateur May 29 '18 at 11:00
  • Okay right, but my question is *how exactly* is *"to use epoll"* making something async? Thats what I don't understand, I'm trying to understand the technical details. Example: Someone is shoving 2GB of data onto the socket, and the server process is now calling `epoll_*`. This will indicate available data, and the subsequent `read()` call will read 2GB of data. This task of reading 2GB of data is done on the mainthread, and thus in no way asynchronous, right? – user826955 May 29 '18 at 11:15
  • That why I tell you there is no real asynchronous system, one way to solve this issue would be to read a max number of bytes on each connection and to loop over them to the point that no data remain. So each client could been serve even if one client try to monopolize the server. As I said your question is too broad, you are asking how to use epoll, this require a book. Worse you require us to define what is asynchronous this require a bigger book. – Stargateur May 29 '18 at 12:05

1 Answers1

3

File descriptors referring to files are always reported as ready for read/write by epoll/poll/select, however, read/write may block waiting for data to be read/written. This is why file I/O must be done in a separate thread.

Whereas non-blocking send/recv with pipes and sockets are truly non-blocking and hence can be done in the I/O thread without risk of blocking the thread.

Maxim Egorushkin
  • 131,725
  • 17
  • 180
  • 271
  • So does this mean that when calling `recv`, the available data has already been transferred/buffered by *something* (OS?) – user826955 May 29 '18 at 11:17
  • @user826955 It does, yes. That data reside in the kernel socket buffer and need to be copied into the process address space, which is what non-blocking `recv` does. – Maxim Egorushkin May 29 '18 at 11:19
  • But copying into process address space would basically be the same task as copying file contents from file IO into process memory, right? In other words uses resources/CPU time, and is not done instantly/atomically. I am really just trying to understand how asynchronous network IO differs from asynchronous file IO in `libuv`, since according to their docs only file IO uses dedicated background threads for the IO work. – user826955 May 29 '18 at 11:21
  • 1
    @user826955 Nope, the file data resides elsewhere on the storage, whereas socket data has already been received by the kernel. – Maxim Egorushkin May 29 '18 at 11:23