1

I am using an external library which is creating a large amount (>4GB)of binary data. I want to buffer this data and feed it to a second library while it is still being created.

My program is a 64 bit process, running on Linux. I cannot make any guarantees about the system which is going to be running it regarding its RAM.

For the first library I implement a virtual function which the library calls, to feed me with the binary data:

virtual void put_data(uint8_t* data, size_t s_data);

I am free to do whatever I want in the implemented function with this data.

The second library expects similarly this data in chunks of 100 bytes. The function looks like that:

void write(const uint8_t* buffer);

So what I could do in theory is start the first lib in one thread, the second lib in a second and while the data is coming from the first lib, in a loop feed this data to the second lib if it is available. Now obviously I cannot just forward the buffers because they might have a different size. Is there a convenient way of doing this in C++, or do I have to write a new Class for that?

Heinrich Heine
  • 303
  • 3
  • 12
  • How does the second library expect its input? If it insists on reading a file, you have a bigger issue than if it just takes an `std::istream&`. – MSalters Sep 10 '15 at 10:05
  • How long does it take to create the 4Gb of data? Is your program a 64 bit process? Do you guarantee the code will be run on a machine with enough RAM to keep it all in one place (remember even if it's got a large amount of RAM, fragmentation issues may prevent you from allocating such a large amount in one go). Does the process create the data in chunks, or all in one go? Questions, questions. – Robinson Sep 10 '15 at 10:16
  • It is created in small chunks. – Heinrich Heine Sep 10 '15 at 10:18
  • My program is a 64 bit process, about the RAM ... not it is not really guaranteed.. – Heinrich Heine Sep 10 '15 at 10:20
  • On which operating system and which file system? On Linux, writing to a `tmpfs` mounted FS does not do any disk IO... Are both libraries called by the same program in the same process? Please **edit your question** to improve it! – Basile Starynkevitch Sep 10 '15 at 10:22
  • BTW, if you have 5Gbytes of data on a 2GByte RAM computer, you are sure that it won't fit in memory! – Basile Starynkevitch Sep 10 '15 at 10:25
  • And you're sure the second lib needs all data in one write call, instead of feeding it continously with chunks? – deviantfan Sep 10 '15 at 10:26
  • Where is the data created by the first library (some `std::ostream`, some file, some in-memory structure)? Could you explain much more about what are these libraries? How long a typical run is lasting (seconds, minutes, hours, days...)? – Basile Starynkevitch Sep 10 '15 at 10:30
  • @BasileStarynkevitch I see your concern. Which leads us to deviantfan's suggestion. It seems to be possible to do the reading and writing simultaneously, and obivously it has to be done this way.. I edited my Question. – Heinrich Heine Sep 10 '15 at 10:32
  • @HeinrichHeine: you did not improve your question enough. Please tell much more about your actual issues. Name the libraries if possible, show the relevant APIs, show some code. You need to explain much more and **improve your question** a lot (add several additional paragraphs) – Basile Starynkevitch Sep 10 '15 at 10:33
  • @BasileStarynkevitch implement a virtual function which feeds me small chunks of the stream in chars. – Heinrich Heine Sep 10 '15 at 10:34
  • Please edit your question, don't comment it. Give some actual code, and some measures. – Basile Starynkevitch Sep 10 '15 at 10:34
  • Sorry, you are right. I will improve my question. – Heinrich Heine Sep 10 '15 at 10:34
  • @BasileStarynkevitch answers in my edited question. – Heinrich Heine Sep 10 '15 at 10:54
  • The fixed 100 byte size is smelling very very bad. – Basile Starynkevitch Sep 10 '15 at 11:29

2 Answers2

5

First, you could write files in a tmpfs mounted file system. Then, the files are sitting in virtual memory, and, as long as they fit in RAM, no disk IO happens. BTW Linux has a good page cache so when a process writes some file which is quickly read and consumed, few disk IO happens (even on ordinary file systems). Read also http://linuxatemyram.com/

Since you mention running two threads, you could set up a pipe(7) to communicate between the threads (with one thread writing to the pipe, and another one reading from it). Read Advanced Linux Programming & some pthreads tutorial. Consider also learning C++11 programming, redesigning your program entirely and using the many features (like closures - learn also more about continuations and CPS, containers, smart pointers, ...) introduced in C++11.

At last, you might keep all the data in virtual memory, perhaps using std::stringstream, open_memstream(3), or just std::string

See also fifo(7), shm_overview(7), sem_overview(7). Learn more about C++11 threads and more generally about Inter Process Communication techniques & event loops (e.g. above poll(2) ...).

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • Thanks for your suggestions. It seems to me that writing a pipe is the most proper way to go. However using sth like std:stringstream the easiest. What do you mean with "keeping all data in virtual memory"? Does stringstream do this automatically? Did you mention it as a proposition for holding all of the binary data, or would I have to implement a mechanism for "deleting" its conente once it is read by the second thread? – Heinrich Heine Sep 10 '15 at 11:26
  • Read the references. Spend days on reading them. Once you'll understand stuff, you'll want to put your code into the garbage bin and redesign it then rewrite it properly. I don't have time (several weeks are required) to teach you all – Basile Starynkevitch Sep 10 '15 at 11:28
  • Ok, but my question was very simple: Did you mention stringstream as a proposition for holding all of the binary data at once or for the asynchronous approach with the two threads? – Heinrich Heine Sep 10 '15 at 11:31
  • Follow the links. It is explained. `stringstream` will hold the data in memory. – Basile Starynkevitch Sep 10 '15 at 11:31
1

If you have access to boost, you could use their shared memory library.

boost interprocess

dau_sama
  • 4,247
  • 2
  • 23
  • 30
  • Hey, thanks for your answer. But there must be a way of doing this properly without adding extra libraries. – Heinrich Heine Sep 10 '15 at 10:11
  • 1
    @HeinrichHeine the relevant part of Boost.Interprocess lets you use `mmap()` in a portable fashion to create huge on-disk areas, and it's header-only so no runtime library dependency. – Tino Didriksen Sep 10 '15 at 10:26
  • you could use directly the linux system calls, I would not advise on that when boost wraps them around in an easy way. shared memory is the way to go for sharing memory across different processes. The accepted solution told you to use threads and share in process memory. That's a different approach, and of course if you're in the same process, no need for sharing memory! – dau_sama Sep 10 '15 at 12:08