Multi processes read different part of a big binary file simultanously

Question

I have a large binary file, and it is saved on a NFS share disk. In the cluster, I want multiple processes to simultaneously read this big file. Each process gets a file pointer, opens the big file and reads starting from the supplied pointer and read some size of bytes.

How do I design this project? As far as I concerned, it is similar to some concurrency databases. Is there any lightweight library or open-source projects related to my project? I use the C++ language.

score 1 · Accepted Answer · answered Dec 26 '15 at 02:44

1

Not sure if there is a point to use a library.

You could use basic stuff. Open and reposition yourself in the file and then perform the read:

http://www.cplusplus.com/reference/fstream/ifstream/open/ http://www.cplusplus.com/reference/istream/istream/seekg/

or

http://www.cplusplus.com/reference/cstdio/fopen/ http://www.cplusplus.com/reference/cstdio/fseek/

answered Dec 26 '15 at 02:44

Nicolae Natea

1,185
9
14

Thank you! I have this question because I'm not sure if the read on multiple processes is related the file lock, etc. I consider the system is similar to the web http requests to the web server. Thus I think we should design a system like the web server, which can handle high concurrency accesses. – mining Dec 26 '15 at 03:00

score 1 · Answer 2 · answered Dec 26 '15 at 02:53

nicolae: I agree :-)

mining: so far you haven't said anything about a need for interaction between your readers.

Consider a simple scenario. Let's say you have your C++ program called "dostuff" which takes the following arguments:

--name     something to lable your output.
--offset   offset point, seek to here (default to zero).
--bytes    number of bytes to process.
inputfile  the file you want to read

The following would run your two processes in the background.

$ dostuff --name "proc1" --offset=0      --bytes=100 \\myserver\myshare\bigfile.dat &
$ dostuff --name "proc2" --offset=100    --bytes=100 \\myserver\myshare\bigfile.dat &

You can open a file handle within each process. So long as the data access is read only why do you want to make it more complex?

important: I'm not saying it shouldn't be more complex, I'm suggesting you haven't yet shown a need for additional complexity. And that complexity is going to come from a need for your readers to collaborate. If they don't need to collaborate then you're pretty much done with your architecture - use the links Nicolae provided and good luck to you.

Thank you! There is no interaction between the readers. All the processes read their own part according to the given pointer and the size to be read. I have this question because I'm not sure if the read on multiple processes is related the file lock, etc. — mining, Dec 26 '15 at 02:57
I consider the system is similar to the web http requests to the web server. Thus I think the system should be similar a web server, which can handle high concurrency accesses. — mining, Dec 26 '15 at 03:01

Multi processes read different part of a big binary file simultanously

2 Answers2