How to read ints from a text file with MPI and C++

Question

I'm having a bit of a problem with parallel reading

I have a text file which looks like this:

1 4 30 46 
0 2 3 29 
1 11 12 -1 
1 4 5 -1 
0 3 13 14 
3 6 7 8 
5 10 -1 -1 
13 10 -1 -1 
5 9 27 -1

and I'm trying to read these ints 4 at a time in each process, the number of lines of the files equals the number of processes and every line contains 4 ints

int bufsize, count; 
int *buf;
MPI::Status status; 

MPI::File top = MPI::File::Open(MPI::COMM_WORLD, "top.txt", MPI::MODE_RDONLY, MPI::INFO_NULL); 

MPI::Offset filesize = top.Get_size(); 
filesize = filesize / sizeof(int);
bufsize = filesize / wasteland_size + 1;

buf = new int[bufsize * sizeof(int)]; 

top.Set_view(my_rank * bufsize * sizeof(int), MPI_INT, MPI_INT, "native", MPI::INFO_NULL); 
top.Read(buf, bufsize, MPI_INT, status); 
count = status.Get_count(MPI_INT); 

top.Close();

this is the code i'm using.

It compiles without errors or warnings, but it outputs something line :

540287025 874524723 805969974 857748000

for every process.

You're going to have problems here because your lines aren't of all the same length; there are 8,9,10,11, and 12-character lines. So simply dividing the file size by the number of processors (or something) and reading it in is unlikely to work. You could use the same approach [as in this answer](http://stackoverflow.com/a/12942718/463827) to divide up the file, and as a post-processing step do any loadbalancing as necessary. But in general text files aren't great for parallel I/O. — Jonathan Dursi, Jan 21 '13 at 20:39
i'm creating that file from another one because i need it to be formatted like this (I'm reading the neighbors of a node in a graph with it) so i can modify it, but i don't know what kind of file to use so i can read it properly, should i make it binary ? what is the best file type to use in this situation ? — cpp_ninja, Jan 21 '13 at 21:03
I would either: (a) create it as a binary file; (b) preprocess it to split it up into the right number of sub files before running (eg, `split --lines=N top.txt` where N is the number of lines per processor) & have each processor read its own file; or, if the file isn't huge, (c) read it in with one processor and then distribute the data using `MPI_Scatter()` or `MPI_Scatterv()`. We could probably cobble something together that would use MPI-IO using the approach in the linked answer but unless there's some other compelling reason I'd tend to think it would be more trouble than it's worth. — Jonathan Dursi, Jan 21 '13 at 22:38
thanks for your answers Jonathan. I actually thought about splitting the file int sub files and MPI_Scatter, but since i'm only learning MPI i wanted to see how MPI-IO works. i will try making the file binary. once again i thank you for taking the time to answer my n00b question :) — cpp_ninja, Jan 21 '13 at 22:42

score 2 · Accepted Answer · answered Jan 21 '13 at 22:53

The problem is that your file is a text file, but it is not interpreted as such. You are reading the integers from file in binary.

When you convert any of these numbers you get to hex you see that they consist of byes that represent digits or space in ASCII.

I would suggest changing the file format so that each number I'd represented as 4 bytes. This also allows you to split the file the way you have done.

How to read ints from a text file with MPI and C++

1 Answers1