0

I have a large binary file, and it is saved on a NFS share disk. In the cluster, I want multiple processes to simultaneously read this big file. Each process gets a file pointer, opens the big file and reads starting from the supplied pointer and read some size of bytes.

How do I design this project? As far as I concerned, it is similar to some concurrency databases. Is there any lightweight library or open-source projects related to my project? I use the C++ language.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
mining
  • 3,557
  • 5
  • 39
  • 66

2 Answers2

1

Not sure if there is a point to use a library.

You could use basic stuff. Open and reposition yourself in the file and then perform the read:

http://www.cplusplus.com/reference/fstream/ifstream/open/ http://www.cplusplus.com/reference/istream/istream/seekg/

or

http://www.cplusplus.com/reference/cstdio/fopen/ http://www.cplusplus.com/reference/cstdio/fseek/

Nicolae Natea
  • 1,185
  • 9
  • 14
  • Thank you! I have this question because I'm not sure if the read on multiple processes is related the file lock, etc. I consider the system is similar to the web http requests to the web server. Thus I think we should design a system like the web server, which can handle high concurrency accesses. – mining Dec 26 '15 at 03:00
1

nicolae: I agree :-)

mining: so far you haven't said anything about a need for interaction between your readers.

Consider a simple scenario. Let's say you have your C++ program called "dostuff" which takes the following arguments:

--name     something to lable your output.
--offset   offset point, seek to here (default to zero).
--bytes    number of bytes to process.
inputfile  the file you want to read

The following would run your two processes in the background.

$ dostuff --name "proc1" --offset=0      --bytes=100 \\myserver\myshare\bigfile.dat &
$ dostuff --name "proc2" --offset=100    --bytes=100 \\myserver\myshare\bigfile.dat &

You can open a file handle within each process. So long as the data access is read only why do you want to make it more complex?

important: I'm not saying it shouldn't be more complex, I'm suggesting you haven't yet shown a need for additional complexity. And that complexity is going to come from a need for your readers to collaborate. If they don't need to collaborate then you're pretty much done with your architecture - use the links Nicolae provided and good luck to you.

jgreve
  • 1,225
  • 12
  • 17
  • Thank you! There is no interaction between the readers. All the processes read their own part according to the given pointer and the size to be read. I have this question because I'm not sure if the read on multiple processes is related the file lock, etc. – mining Dec 26 '15 at 02:57
  • I consider the system is similar to the web http requests to the web server. Thus I think the system should be similar a web server, which can handle high concurrency accesses. – mining Dec 26 '15 at 03:01