reading big data in C++

Question

Im using C++ to read large files with over 30000 lines and 3000 colums. (30000 x 3000) matrix. im using a 2d vector to push the read data. But i need to do this process a couple of times. Is there any way to optimize the reading process?

"process a couple of times" means reading the same file couple of times? is the file chosen at run time? one way could be parallel read. — Koushik Shetty, May 22 '13 at 04:25
i will be using the data for classification purposes and thats why i might need to go through the full data again and again. the file is now chosen at runtime. — Abhishek Thakur, May 22 '13 at 07:18

score 2 · Answer 1 · answered May 22 '13 at 04:43

2

Memory map mechanism is Ok, since there are only reading operations.

answered May 22 '13 at 04:43

mumu

197
7

[link](http://www.boost.org/doc/libs/1_53_0/doc/html/interprocess/sharedmemorybetweenprocesses.html#interprocess.sharedmemorybetweenprocesses.mapped_file)Click here for details. – mumu May 22 '13 at 04:48

score 2 · Accepted Answer · answered May 22 '13 at 05:14

I Will give you some ideas but not exact solution. Because I do not know full details of your system.

Actually if you have this much big file with data and only some data change in next reading. try to use some Data base methodology.
For performance you can use concurrent file reading (read same file part by part by using multiple thread).
If you need to process data as well, then use separate thread(s) to process and may possible to link by a queue or parallel queues.
If your data length is fixed (such as fix length numbers). and if you know the changed location, try to read only changed data instead of reading and processing whole file again and again.
if any of above not helped use memory mapping methodology. If you looking for portability, Boost Memory-Mapped Files will support you to reduce your works

reading big data in C++

2 Answers2