Can I adapt a function that writes to disk to write to memory

Question

I have third-party library with a function that does some computation on the specified data, and writes the results to a file specified by file name:

int manipulateAndWrite(const char *filename,
                       const FOO_DATA *data);

I cannot change this function, or reimplement the computation in my own function, because I do not have the source.

To get the results, I currently need to read them from the file. I would prefer to avoid the write to and read from the file, and obtain the results into a memory buffer instead.

Can I pass a filepath that indicates writing to memory instead of a filesystem?

yeah, use a RAM-drive path. related: https://stackoverflow.com/questions/36706602/how-to-mount-a-drive-in-ram-in-java — Jean-François Fabre, May 23 '18 at 18:46
solutions exist on both Linux & Windows to create a RAM disk, and pass the path of a file from this RAM disk. It doesn't write to disk. Note that if the file is small, wrting to the temporary directory doesn't write to disk at all if re-read immediately & deleted. — Jean-François Fabre, May 23 '18 at 18:54
this third party library has a broken design. Allowing to write to a `FILE *` object would have given you more options. — Jean-François Fabre, May 23 '18 at 18:55
@Jean-FrançoisFabre true, but on Debian and derivates you can easily read/write to /dev/shm — David Ranieri, May 23 '18 at 18:56
then it's the solution for those OSes. Amiga OS has a `RAM:` device as well (since 1985). It's just this crap windows that needs some fiddling to get a ram drive. — Jean-François Fabre, May 23 '18 at 18:58
This is for windows and linux. I don't want to resort to RAM drives. And the files are quite large. — Ben L, May 23 '18 at 19:00
`/dev/shm` probably stands for "shared memory". So it looks like it's a good option for Linux at least. if you don't want RAM drives, then you need to can this broken library, or reverse engineer/hack it to be able to pass a `FILE *` (since the first thing that it probably does is to convert the string to `FILE *` — Jean-François Fabre, May 23 '18 at 19:01
Hopefully, the library owner will respond to my request. But in the meantime I appreciate any help. Even knowing something is not possible is useful information. — Ben L, May 23 '18 at 19:02
Have you tried to interpose the `writeFoo()` function with your own? If it is a dynamic library (`.so`), it should be simple. If it is a static library, consider writing your own writeFoo(), and replacing it in the library. You don't need sources for that, except for your own replacement, and for defining the `FOO_DATA` type. — Nominal Animal, May 23 '18 at 20:30
@NominalAnimal I am the caller of the function, not the callee. If I could write the function myself, I wouldn't be asking the question at all. — Ben L, May 25 '18 at 15:10
@BenL: Ah, now I understand: the function *manipulates*, then writes the data to the file, but you'd prefer to obtain the manipulated data without having to reread it from the file. So: Run `strace` on the binary, to see which syscalls the function actually uses to write to the file. These are almost certainly `open()`, `write()`, and `close()`. You can interpose them, replacing the write part with a copy to memory. Do you want an outline as to exactly how to do this? — Nominal Animal, May 25 '18 at 17:16
@BenL: Do feel free to roll-back or re-edit the question. However, I did find that information (that was only implied in the function name!) crucial to a proper answer to this question, so some kind of edit that emphasizes that, is needed in my opinion. — Nominal Animal, May 25 '18 at 17:24

score 1 · Answer 1 · answered May 25 '18 at 17:56

Yes, you have several options, although only the first suggestion below is supported by POSIX. The rest of them are OS-specific, and may not be portable across all POSIX systems, although I do believe they work on all POSIXy systems.

You can use a named pipe (FIFO), and have a helper thread read from it concurrently to the writer function.

Because there is no file per se, the overhead is just the syscalls (write and read); basically just the overhead of interprocess communication, nothing to worry about. To conserve resources, do create the helper thread with a small stack (using pthread_attr_ etc.), as the default stack size tends to be huge (on the order of several megabytes; 2*PTHREAD_STACK_SIZE should be plenty for helper threads.)

You should ensure the named pipe is in a safe directory, accessible only to the user running the process, for example.
In many POSIXy systems, you can create a pipe or a socket pair, and access it via /dev/fd/N, where N is the descriptor number in decimal. (In Linux, /proc/self/fd/N also works.) This is not mandated by POSIX, so may not be available on all systems, but most do support it.

This way, there is no actual file per se, and the function writes to the pipe or socket. If the data written by the function is at most PIPE_BUF bytes, you can simply read the data from the pipe afterwards; otherwise, you do need to create a helper thread to read from the pipe or socket concurrently to the function, or the write will block.

In this case, too, the overhead is minimal.
On ELF-based POSIXy systems (basically all), you can interpose the open(), write(), and close() syscalls or C library functions.

(In Linux, there are two basic approaches, one using the linker --wrap, and one using dlsym(). Both work fine for this particular case. This ability to interpose functions is based on how ELF binaries are linked at run time, and is not directly related to POSIX.)

You first set up the interposing functions, so that open() detects if the filename matches your special "in-memory" file, and returns a dedicated descriptor number for it. (You may also need to interpose other functions, like ftruncate() or lseek(), depending on what the function actually does; in Linux, you can run a binary under ptrace to examine what syscalls it actually uses.)

When write() is called with the dedicated descriptor number, you simply memcpy() it to a memory buffer. You'll need to use global variables to describe the allocated size, size used, and the pointer to the memory buffer, and probably be prepared to resize/grow the buffer if necessary.

When close() is called with the dedicated descriptor number, you know the memory buffer is complete, and the contents ready for processing.
You can use a temporary file on a RAM filesystem. While the data is technically written to a file and read back from it, the operations involve RAM only.

You should arrange for a default path to one to be set at compile time, and for individual users to be able to override that for their personal needs, for example via an environment variable (YOURAPP_TMPDIR?).

There is no need for the application to try and look for a RAM-based filesystem: choices like this are, and should be, up to the user. The application should not even care what kind of filesystem the file is on, and should just use the specified directory.

score -3 · Answer 2 · answered May 23 '18 at 18:45

-3

You could not use that library function. Take a look at this on how to write to in-memory files: Is it possible to create a C FILE object to read/write in memory

answered May 23 '18 at 18:45

Gabriel Tapizquent

70
4

this link doesn't help as it's unrelated. – Jean-François Fabre May 23 '18 at 18:48
He asked for options, and the question is tagged with c and POSIX. How is the link unrelated? – Gabriel Tapizquent May 23 '18 at 18:50
3

because OP doesn't have a `FILE` object in the interface to begin with – Jean-François Fabre May 23 '18 at 18:50
Hence this is presented as an option. An alternative to using his library function. – Gabriel Tapizquent May 23 '18 at 18:52
It's not my library function. The issue is I need to capture the other data manipulations this function is performing without the penalty of hitting the disks. – Ben L May 23 '18 at 18:58
Interesting. As @Jean-FrançoisFabre mentioned, this third-party library has a broken design. His suggestions might be helpful though. – Gabriel Tapizquent May 23 '18 at 19:00

Can I adapt a function that writes to disk to write to memory

2 Answers2