types of buffers

Question

Recently an interviewer asked me about the types of buffers. What types of buffers are there ? Actually this question came up when I said I will be writing all the system calls to a log file to monitor the system. He said it will be slow to write each and every call to a file. How to prevent it. I said I will use a buffer and he asked me what type of buffer ? Can some one explain me types of buffers please.

In your application, buffering the output might make the log file less useful. If "monitor the system" includes "diagnose crashes" then delaying the output can render that more difficult. Just a thought. In an interview, asking clarifying questions about use cases may be helpful. — Heath Hunnicutt, Jul 06 '10 at 04:49
Was the question about implementing logging inside the OS (from the POV of system call implementor) or from the app? — Heath Hunnicutt, Jul 06 '10 at 05:14
@Mousey, in that case the answer you accepted, about stdio buffers, has nothing to do with the question you were asked in the interview. Stdio buffers do not exist inside the kernel. The answer you were looking for has to do with dmesg and using a circular buffer. In the kernel, you can't use a buffer which requires dynamic memory. What if you are logging that the dynamic memory infrastructure just broke? You can't allocate more memory to log _that_. — Heath Hunnicutt, Jul 07 '10 at 02:17
@heath so my answer should be LIFO , FIFO and circular buffers ? — mousey, Jul 07 '10 at 02:45
@Mousey, if it's an interview, I would say your answer would be: is this log supposed to function with or without dynamic allocation, i.e., is the VM enabled and should I use it? If the answer is no dynamic allocation allowed, then the usual answer is 'ring buffer.' This is what the kernel uses for the dmesg buffer. In the case of error log, it's almost always FIFO. I think the dmesg answer is the closest to what you were actually asked; dmesg _is_ a kernel log structure suitable for logging system calls, the question you were asked would extend that to writing an output file. — Heath Hunnicutt, Jul 07 '10 at 03:44

paxdiablo · Accepted Answer · 2010-07-07T04:37:24.533

In C under UNIX (and probably other operating systems as well), there are usually two buffers, at least in your given scenario.

The first exists in the C runtime libraries where information to be written is buffered before being delivered to the OS.

The second is in the OS itself, where information is buffered until it can be physically written to the underlying media.

As an example, we wrote a logging library many moons ago that forced information to be written to the disk so that it would be there if either the program crashed or the OS crashed.

This was achieved with the sequence:

fflush (fh); fsync (fileno (fh));

The first of these actually ensured that the information was handed from the C runtime buffers to the operating system, the second that it was written to disk. Keep in mind that this is an expensive operation and should only be done if you absolutely need the information written immediately (we only did it at the SUPER_ENORMOUS_IMPORTANT logging level).

To be honest, I'm not entirely certain why your interviewer thought it would be slow unless you're writing a lot of information. The two levels of buffering already there should perform quite adequately. If it was a problem, then you could just introduce another layer yourself which wrote the messages to an in-memory buffer and then delivered that to a single fprint-type call when it was about to overflow.

But, unless you do it without any function calls, I can't see it being much faster than what the fprint-type buffering already gives you.

Following clarification in comments that this question is actually about buffering inside a kernel:

Basically, you want this to be as fast, efficient and workable (not prone to failure or resource shortages) as possible.

Probably the best bet would be a buffer, either statically allocated or dynamically allocated once at boot time (you want to avoid the possibility that dynamic re-allocation will fail).

Others have suggested a ring (or circular) buffer but I wouldn't go that way (technically) for the following reason: the use of a classical circular buffer means that to write out the data when it has wrapped around will take two independent writes. For example, if your buffer has:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|s|t|r|i|n|g| |t|o| |w|r|i|t|e|.| | | | | | |T|h|i|s| |i|s| |t|h|e| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                                 ^           ^
                                 |           |
                   Buffer next --+           +-- Buffer start

then you'll have to write "This is the " followed by "string to write.".

Instead, maintain the next pointer and, if the bytes in the buffer plus the bytes to be added are less than the buffer size, just add them to the buffer with no physical write to the underlying media.

Only if you are going to overflow the buffer do you start doing tricky stuff.

You can take one of two approaches:

Either flush the buffer as it stands, set the next pointer back to the start for processing the new message; or
Add part of the message to fill up the buffer, then flush it and set the next pointer back to the start for processing the rest of the message.

I would probably opt for the second given that you're going to have to take into account messages that are too big for the buffer anyway.

What I'm talking about is something like this:

initBuffer:
    create buffer of size 10240 bytes.
    set bufferEnd to end of buffer + 1
    set bufferPointer to start of buffer
    return

addToBuffer (size, message):
    while size != 0:
        xfersz = minimum (size, bufferEnd - bufferPointer)
        copy xfersz bytes from message to bufferPointer
        message = message + xfersz
        bufferPointer = bufferPointer + xfersz
        size = size - xfersz
        if bufferPointer == bufferEnd:
            write buffer to underlying media
            set bufferPointer to start of buffer
        endif
    endwhile

That basically handles messages of any size efficiently by reducing the number of physical writes. There will be optimisations of course - it's possible that the message may have been copied into kernel space so it makes little sense to copy it to the buffer if you're going to write it anyway. You may as well write the information from the kernel copy directly to the underlying media and only transfer the last bit to the buffer (since you have to save it).

In addition, you'd probably want to flush an incomplete buffer to the underlying media if nothing had been written for a time. That would reduce the likelihood of missing information on the off chance that the kernel itself crashes.

Aside: Technically, I guess this is sort of a circular buffer but it has special case handling to minimise the number of writes, and no need for a tail pointer because of that optimisation.

+1 for detailed explanation. If I were asked this question in an interview, I would have just said "stdio buffers". — R.. GitHub STOP HELPING ICE, Jul 06 '10 at 05:07

score 2 · Answer 2 · answered Jul 06 '10 at 04:43

2

There are also ring buffers which have bounded space requirements and are probably best known in the Unix dmesg facility.

answered Jul 06 '10 at 04:43

msw

42,753
9
87
112

score 0 · Answer 3 · answered Jul 06 '10 at 04:43

What comes to mind for me is time-based buffers and size-based. So you could either just write whatever is in the buffer to file once every x seconds/minutes/hours or whatever. Alternatively, you could wait until there are x log entries or x bytes worth of log data and write them all at once. This is one of the ways that log4net and log4J do it.

score 0 · Answer 4 · answered Jul 06 '10 at 04:45

0

Overall, there are "First-In-First-Out" (FIFO) buffers, also known as queues; and there are "Latest*-In-First-Out" (LIFO) buffers, also known as stacks.

To implement FIFO, there are circular buffers, which are usually employed where a fixed-size byte array has been allocated. For example, a keyboard or serial I/O device driver might use this method. This is the usual type of buffer used when it is not possible to dynamically allocate memory (e.g., in a driver which is required for the operation of the Virtual Memory (VM) subsystem of the OS).

Where dynamic memory is available, FIFO can be implemented in many ways, particularly with linked-list derived data structures.

Also, binomial heaps implementing priority queues may be used for the FIFO buffer implementation.

A particular case of neither FIFO nor LIFO buffer is the TCP segment reassembly buffers. These may hold segments received out-of-order ("from the future") which are held pending the receipt of intermediate segments not-yet-arrived.

* My acronym is better, but most would call LIFO "Last In, First Out", not Latest.

answered Jul 06 '10 at 04:45

Heath Hunnicutt

18,667
3
39
62

Your acronym isn't any better. It is better to stay with the well-established meaning of technical terms. Inventing your own meaning or spelling could result in confusion, or it could even be seen as a sign of incompetence. – Secure Jul 06 '10 at 07:12
That said, there is of course nothing wrong with inventing your own concepts that happen to collide with an already existing acronym, since there is only a limited number of acronyms. Just avoid redefinitions of existing concepts. – Secure Jul 06 '10 at 07:18
Wow, you seem to feel personally insulted by my downvote and comment, reading through all my old answers and downvoting a lot of them. Sorry about this. But this sort of reaction is a sure way to receive downvotes in the future **without** actually explaining the reasons. – Secure Jul 06 '10 at 09:05
You're both correct and incorrect, but mostly the latter. I have given you reasons for my downvotes, and they are just as lame as yours. You're too self-satisfied and preachy. Your answers and comments are your opinions more than technical material. I'm glad we've gotten off to this start. I think you're lame. – Heath Hunnicutt Jul 06 '10 at 14:08
Finally, a personal enemy on SO. I'm honoured... I think. Please feel invited to comment all of my old answers and downvote them as you like, since I don't play SO for the highscore. Just don't expect any more answer from me on this topic, as I won't descend to this kindergarten level. If you have specific questions of clarification about a specific point, please ask and I will answer. Thank you. – Secure Jul 06 '10 at 15:03
Just one last question: You've downvoted this (accepted) answer without a comment: http://stackoverflow.com/questions/3110088/c-switch-statement-has-default-to-be-the-last-case/3110478#3110478 This answer was based on ANSI C standard citations. If this is only my self-satisfied opinion and not technical material, I'm a bit at loss about what exactly you expect? – Secure Jul 06 '10 at 15:05
Don't swear to a last anything, that doesn't befit your personality type. I won't bother to comment all your downvotes, but I did bother to lead you to water on 3110088, which may be accepted by the masses but is lame pedagogy, actually misleading. – Heath Hunnicutt Jul 06 '10 at 15:40
You've actually downvoted the ANSI C Standard itself, from which the code in question was cited. ;) I think I can live with this answer. Thank you and goodbye. – Secure Jul 06 '10 at 16:37
It's not that the example is wrong; it is not. The example was useless given the question. The non-accepted answer to 3110088 has six times (6x) the votes yours does. That means something and it concurs with my view that your answer is poor given the question. Thank you and good-bye. – Heath Hunnicutt Jul 07 '10 at 02:15

score 0 · Answer 5 · answered Jul 06 '10 at 05:06

Correct me if I'm wrong, but wouldn't using a mmap'd file for the log avoid both the overhead of small write syscalls and the possibility of data loss if the application (but not the OS) crashed? It seems like an ideal balance between performance and reliability to me.

types of buffers

5 Answers5