Linux kernel (and other system programs): Is it possible to configure/tweak them for optimal performance, with maximum hardware utilization?

Question

I am not an OS/system programmer (still it has been attracting me a lot :) ), so I am not sure if this is a good question, and it might not be clear enough(its more than 1 question actually).

Is it possible/feasible/practical to have a Linux kernel (and other system programs) to be tuned (via configuration or any rewrite) for optimal performance, so that they can utilize the hardware to the maximum (if not 100%)?

My question is not specific to OS but I think OS optimization can be of a huge help.

I can think of controlling number of threads and distributing loads equally to them can help achieving this. With a large number of multicore (1/2/4/6/8) processors in use, I think a software should (somehow(?)) realize the number of cores (either while getting installed or at the invocation) AND distribute load evenly among them. Without having the softwares to fully utilize hardware, these hardwares' power is wasted in my opinion.

There are other ways too which I can think of like for eg say I have a quad core machine, with 5400rpm sata HDD being the bottleneck so if the software can realize that and minimize disk reads/writes by increasing caches and using async/delayed read/writes it will help improving the performance.

I also want to know that with the OpenCL and CUDA libraries, can intelligently using processing units in GPUs help substantially?

I haven't written (or even read) any serious program (except my work, which is web related, client-server type) but I surely will this a try. Does what I think/assume and what I ask make any sense? Or am I going mad?

score 1 · Answer 1 · answered Mar 01 '13 at 17:14

For changing disk I/O for caching, this partially already happens in modern operating systems. Linux for example will cache the contents of all files you access in the RAM (and even in some cases prefer to keep those in the RAM instead of actual application data, if free RAM gets low), as long as there is sufficient space.

However, applications have the notion to expect data to be safe and persistent after it has been written to the disk (at least after they reassured, for example by calling fsync). The operating system has to make sure that it actually is, so it has to wait for the drive to actually write the data. If the application does not ask for this, Linux for example will just pretend it had written the data to disk, but in fact keep it only in memory while writing it to the disk as fast as the disk allows. In the meantime, the application can continue doing whatever it did before.

Much of the things you suggest already happen in operating systems, it's just sometimes you really cannot use 100% of the hardware because that would violate expectations or guarantees (like above with the Disk I/O) or there is just not enough work to do. If you start an application which does heavy parallel computing (e.g. using OpenCL), it'll certainly load all CPUs it can get until the work is done.

Also, sometimes you cannot load the CPU to 100%, because it has to wait for other devices, like hard disks, memory, network and so on. The best file caching won't help if there is not enough memory to keep the data in: You'll have to swap data to hard drives and that is slow, wasting CPU cycles.

Note however, while applications are waiting for input or output on disk and other devices, other applications can usually run. Operating systems usually really try to get as much out of the hardware as you can get.

score 1 · Answer 2 · answered Mar 01 '13 at 17:29

Most of the time spent by a process would be getting blocked by I/O. Examples are:

Read/Write a chunk of data from/to a secondary storage into RAM (when doing fread or fwrite for example)
Wait for packets to arrive over the network.

As you could see the process is blocked until these operations complete. If there is logic in the application process to do something useful while it it waiting for this, yes the kernel would be happy to service those threads. If not the kernel is going to put the process to sleep while it services threads belonging to other processes which are ready.

Compute intensive applications are different. They tend to run huge number of iterations over a data set (for example matrices) to compute something. The compiler tries its own optimizations like vectorization to cut down some CPU cycles on modern processors. Using libraries like openMP might help you indicate parallel sections of the code and specify how many threads you want to spawn when executing that piece of code. Again all these are optimizations of the same application done in userspace.

The kernel is chiefly responsible for scheduling the tasks. There are different scheduler algorithms and if you take a look at any distro kernel, they would supply few variants. The -desktop variant is the one you would commonly see being used since the scheduler is optimized for most desktop application usage. There are other variants like -realtime, `-xen and so on again meant to be used for specific applications. You can take a look at how different the scheduler behaves in all these cases.

Linux kernel (and other system programs): Is it possible to configure/tweak them for optimal performance, with maximum hardware utilization?

2 Answers2