1

I want to implement a fast logger, that holds log entries, and when a certain trigger arrives, it flushes the last X messages.

So the idea is to hold all the messages in a cyclic buffer, and once we have the trigger, to push it's ID to a queue, which another thread monitors(one thread in all the system). this thread will go back X messages and to flush them. I know how to deal with messages that are being written while I am trying to flush, messages that have been overwritten before I flushed messages that being flushed while I am trying to update them etc.

My problem is, if for example I have 20 threads writing messages, and only 10 cores, in the time deference between 2 "writer" thread's execution, all the buffer will be overwritten several times.

Is there any way that "my" thread can "force" the "writer" thread to execute(or to give it it's time slice? I guess no but still... can you advice on any other way/design to overcome this problem.

yosim
  • 503
  • 2
  • 8
  • 1
    Depends, but having multiple threads writing to the hard drive at a time will saturate the speed of the hard drive long before it saturates the CPU processing speed. For simplicity sake, I'd only have 1 write thread :-/. In either case, mutex or file locks would be helpful to ensure that only 1 (fifo) thread gets to access either the buffers or the write location. – IdeaHat Jan 06 '14 at 16:07
  • @MadScienceDreams: yes, I have one writer (updated the question accordingly). but the problem remains, the buffer is overwritten before the writer get a chance to flush it – yosim Jan 06 '14 at 16:15
  • You can give that thread a higher priority, so as soon as it is signaled (through a conditional variable, mutex, semaphore, readers-writers lock, whatever you want), the system scheduler would execute that thread. Note that your problem could be better solved with a better design. If your buffer gets overwritten without your control, then you need to rethink your logging mechanism. – Shahbaz Jan 06 '14 at 16:53
  • @Shahbaz: Yes, thought about that, Unfortunately, this is not an option. – yosim Jan 06 '14 at 17:01

3 Answers3

1

As I understand you want to resume your thread as soon as there are new ID's in the queue available. It's possible with locking primitives — your writer thread should sleep until notified from your trigger thread. How to achieve this behavior depends on multithreading framework you are using.

For example, in C++11 you can have a look at std::condition_variable.

Edit. As mentioned in the comments, the disk IO is slow, so you need to fetch the messages to memory in the writer thread, and only then write them to disk. During IO the buffer can be overwritten by the arriving messages.

Dmitry Markin
  • 1,145
  • 8
  • 8
  • Does signalling a thread that is currently sleeping on a condition, gaurantees that it will wake up? I tried that with mutex, and it did not work, I thought of condition, but I was not able to find such guarantee. – yosim Jan 06 '14 at 17:04
  • Conditional variable a specifically meant to deal with such situations, so the thread will wake up! In fact, even [fake wake-ups](http://www.cplusplus.com/reference/condition_variable/condition_variable/wait/) are possible with some implementations. It is possible to implement conditionals in terms of mutexes, but mutexes are primarily meant for "mutual exclusion". – Dmitry Markin Jan 06 '14 at 17:35
0

I've written something similar before, where calls to log methods are actually placed on a queue which another thread (T-Logger) monitors. This frees up the other threads from having to having to call the underlying log APIs and works well in low latency applications.

If you want to explicitly buffer and then write on a trigger then I'd still recommend doing all the writes from one thread, such as T-Logger and then use some sort of conditional variable to signal to T-Logger that it should now go and write the items in the queue to the underlying log file.

As mentioned in the question comments, you should avoid having multiple threads trying to do IO. IO is incredible slow, and having all your threads trying to write to a file will cause them to give up their CPU cycles waiting for the IO to complete.

Sean
  • 60,939
  • 11
  • 97
  • 136
  • @Shahbaz - true. I rephrased my answer slightly. – Sean Jan 06 '14 at 16:20
  • Yes, I am going to have only one thread doing the writes. the main question is, does condition variable guarantees that the thread will wake up as soon as it is signaled? I was not able to find such guarantee (although, it is possible I was not looking deep enough...) – yosim Jan 06 '14 at 17:06
  • @yosim - the conditional variable will cause any thread that is waiting on it to transition to a "runnable" state, and at some point the OS will schedule the thread on a CPU and run it. You're never going to end up with a solution that will cause a waiting thread to start running immediately as the OS will have to take the scheduling of all threads into account. – Sean Jan 06 '14 at 17:10
  • This is what I thought, and unfortunately, it brings us back to square 1. I implemented it using mutex, and I still have the problem. Looks like a change in design is needed. – yosim Jan 06 '14 at 17:23
  • @yosim - why is it so important that the other thread wakes up immediately? No OS is going to give you this guarantee. – Sean Jan 07 '14 at 07:35
  • I dont care if this is really immediately, or "shortly after". the problem is, that say I have 20 threads that writes to the logger, and only 1 log-writer (that does the IO). now, say the first thread triggered flush. the writer will not execute until all the remaining 19 threads will get their time share, possibly(probably) overwriting the "relevant messages. – yosim Jan 07 '14 at 11:06
  • @yosim You can't implement it with only mutexes (without spinning 100% CPU on a shared variable), a mutex can only be unlocked from the same thread that locks it. You need a condition variable (and that condition variable also need a mutex), and you should use a queue for the messages you log. Here's a simple example: http://stackoverflow.com/a/2379903/126769 – nos Jan 07 '14 at 12:01
0

Sounds like a classic case to use a semaphore, initialized with the length of the circular buffer. The log call from the threads needing to log stuff has to get a unit from the semaphore before proceeding and the logging thread signals the semaphore when it extracts an entry from the queue. If the buffer runs out, any thread trying to log will then block until there is space for its log entry.

Obviously, the circular buffer/queue/whatever container for the log entries must be thread-safe.

Martin James
  • 24,453
  • 3
  • 36
  • 60
  • You can achieve the same without semaphore. just keep the last index you flushed, and if your next index is this index - you should block/do something else as the buffer is full. However, this has 2 main flaws: 1st, the logger must not block. But more important, not all messages will need to be flushed. It is very possible that we will have few handstand of messages without flashing even one, and then, flush everything. – yosim Jan 07 '14 at 13:51