3

I'm writing a straightforward C program on Linux and wish to use an existing library's API which expects data from a file. I must feed it a file name as a const char*. But i have data, just like content of a file, already sitting in a buffer allocated on the heap. There is plenty of RAM and we want high performance. Wanting to avoid writing a temporary file to disk, what is a good way to feed the data to this API in a way that looks like a file?

Here's a cheap pretend version of my code:

marvelouslibrary.h:

int marvelousfunction(const char *filename);

normal-persons-usage.cpp, for which library was originally designed:

#include "marvelouslibrary.h"
int somefunction(char *somefilename)
{
    return marvelousfunction(somefilename);
}

myprogram.cpp:

#include "marvelouslibrary.h"
int one_of_my_routines() 
{
    byte* stuff = new byte[1000000];
    // fill stuff[] with...stuff!
    // stuff[] holds same bytes as might be found in a file

    /* magic goes here: make filename referring to stuff[] */

   return marvelousfunction( ??? );
}

To be clear, the marvelouslibrary does not offer any API functions that accept data by pointer; it can only read a file.

I thought of pipes and mkfifo(), but seems meant for communicating between processes. I am no expert at these things. Does a named pipe work okay read and written in the same process? Is this a wise approach?

Or skip being clever, go with plan "B" which is to shuddup and just write a temp file. However, i'd like to learn something new and find out what's possible in this situation, beside getting high performance.

DarenW
  • 16,549
  • 7
  • 63
  • 102

5 Answers5

3

Given that you likely have a function like:

char *read_data(const char *fileName)

I think you will need to "skip being clever, go with plan "B" which is to shuddup and just write a temp file."

If you can dig around and find out if the call you are making is calling another function that takes a File * or an int for the file descriptor then you can do something better.

One thought that does come to mind, can you cahnge your code to write to a memory mapped file instead of to the heap? That way you would have a file on disk already and you would avoid the copying (though it'll still be on disk) and you can still give the function call the file name.

TofuBeer
  • 60,850
  • 18
  • 118
  • 163
  • so far, while it goes against my goal of learning something marvelous and new, this does let me get on with work and be productive. – DarenW Mar 01 '09 at 23:36
  • The "marvelous and new" thing you learned is that when you are making an API like that that you will provide a filename version AND a File * or file descriptor version too. – TofuBeer Mar 02 '09 at 02:38
2

I'm not sure what kind of input the library function wants ... does it need a path/file name, or open file pointer, or open file descriptor?

If you don't want to hack the library and the function wants a string (path to a file), try making the temporary file in /dev/shm.

Otherwise, mmap might be the best option, please be sure to research posix_madvise() when using mmap() (or its counterpart posix_fadvise() if using a temporary file).

It looks like your talking about very little data to begin with, so I don't think you'll see a performance impact in whatever route you take.

Edit

Sorry, I just re-read your question .. perhaps I just read too fast. There is no way you are going to feed a function like:

char * foo(const char *filepath)

... with mmap().

If you can not modify the library to accept a file descriptor instead (or as an alternate to the path) .. just use /dev/shm and a temporary file, it will be quite cheap.

Tim Post
  • 33,371
  • 15
  • 110
  • 174
  • the data is megabytes in size. – DarenW Mar 01 '09 at 07:03
  • ok, reading about mmap() - how to use it in this case? relevant web pages or book references? – DarenW Mar 01 '09 at 07:59
  • Can you update your question to include a prototype of the library function? Is it asking for a pathname, file pointer or file descriptor? – Tim Post Mar 01 '09 at 10:16
  • I, myself, do not understand how to use mmap() in such a case. mmap() works in the opposite direction, from a file descriptor to an address. I do not see how to use to go from an address to a file name. – bortzmeyer Mar 01 '09 at 18:13
  • That's why I'm asking the OP what his function wants .. if he can edit it and if not suggested /dev/shm. – Tim Post Mar 01 '09 at 18:34
  • I'm not quite clear if the function in question (in the library) wants a file name, pointer or descriptor. – Tim Post Mar 01 '09 at 18:35
  • code put in. yes, mmap does work backward to what i think i want. what i would like is to have no file involved, but the library can only get its data from a file, or some file-like entity such as a fifo, device, etc. – DarenW Mar 01 '09 at 23:32
0

Edit: Sorry. Just read the question. With my advise below, you fork a spare process, and the question of "does in work in a single process does not come up". I also see no reason you couldn't spawn a separate thread to do the push...


Not in the least elegant, but you could:

  1. open a named pipe.
  2. fork a streamer that does nothing but try to write to the pipe
  3. pass the name of the pipe

which should be pretty robust...

dmckee --- ex-moderator kitten
  • 98,632
  • 24
  • 142
  • 234
0

You're on Linux, can't you just grab the source of the library and hack in the function you need? If it's useful to others, you could even send a patch to the original author, so it will be in future versions for everyone.

Ana Betts
  • 73,868
  • 16
  • 141
  • 209
  • yes, it would be nice to hack the library, but i'll have to put that on my to-do list for next week. not sure it's open source, though. even if _this_ time i can hack onward, next time i may be dealing with some proprietary junk for some client. – DarenW Mar 01 '09 at 23:36
-1

mmap(), perhaps?

smcameron
  • 758
  • 4
  • 7
  • 8