I am writing a logging mechanism that will be used by the rest of the code to log alphanumeric data to file. Every other module in the system will be sending alphanumeric sentences (a couple of words at max) to be written to file continuously. The catch is, I have only been given a small amount of pre-allocated memory to use for my data structures and in-memory storage of these log messages. If the inflow is more that what can be written to disk, the log messages will be discarded.
I want to put in a compression mechanism between the client and in-memory storage in my log module, so that I can save as many messages as possible.
My current design so far:
CLIENT ------> LOG MODULE ----> compress and store in in-memory buffer 1
Writer thread: When its time to write, switch buffer 1 with buffer 2 and write buffer 1 to file. The client will be writing to buffer 2 during this time.
Script outside: Decompress and show log messages
Question: What is a good alphanumeric compression algorithm I can use or a good data structure I can use to capture as much data as possible (during the compression stage above)?
If possible, I would like an algorithm that doesn't store compression code in an intermediate data structure - i.e., if the system crashes, I want to be able to decompress whatever has been written to file so far.
Attempt so far: assign a code to every charecter we will be using. Doesnt seem so flexible.
Most of the log messages are simple text sentences