2

referring to my last question (Multiple child process), i am now trying to make an external sorting implementation using multiple child process.

...
fp = fopen(pathname, "r"); // open inputfile in r mode
fgets(trash, 10, fp); // ignore first line

for (i=0; i<numberOfProcess; ++i) {
    #ifdef DBG
        fprintf(stderr, "\nDBG: Calling fork()\n"); 
    #endif

    if ((pids[i] = fork()) < 0) {
        perror("fork error");
        exit(EXIT_FAILURE);

    } else if (pids[i] == 0) { // Child Code

        if (numbersToSort % numberOfProcess == 0) { // 16 % 4 = 0
            partialDataSize = numbersToSort / numberOfProcess;          

            for (j=0; j<partialDataSize; j++) { 
                fscanf(fp, "%d", &arrayPartialData[j]);
                qsort(arrayPartialData, partialDataSize, sizeof(int), (void *)comp_num);

                //printf("%d\n", arrayPartialData[j]);
                // TODO: qsort data until partialDataSize
            }

        } 
        printf("pid: %d child process %d outputs: ", getpid(), pids[i]);
        printArray(arrayPartialData, partialDataSize);
        //break;
        exit(0);
    }  
}   

/* Wait for children to exit. */

while (numberOfProcess > 0) {
    pid = wait(&status);
    --numberOfProcess;
}

fclose(fp);

but of course this code outputs the same sequence of sorted integers from inputfile because of fscanf.. for example if the beginning of input file includes 5 1 4, then it outputs:

(1st child) 1 4 5
(2nd child) 1 4 5

(with two child process).. because fscanf starts to read integers from the beginning of input stream.

my problem now is how can i continue to read the numbers starting from the point where the previous child process left? for example, if input file includes 5 1 4 8 5 10, then it can output:

(1st child) 1 4 5

(2nd child) 5 8 10

thanks in advance;)

Community
  • 1
  • 1
israkir
  • 2,111
  • 7
  • 30
  • 39
  • How will they execute in parallel if they have to read the file sequentially? That doesn't seem like it would make your program faster. – Dusty Campbell May 18 '09 at 16:12
  • my aim is not to execute processes in parallel.. i am just trying to one part of sorting with one child process and another part off sorting with another child process and so on.. at the end, the parent its going to merge the results.. – israkir May 19 '09 at 03:49
  • 1
    Try reading the file into memory before you fork off the children, then make each child smart enough to know which part of the larger array it owns. – Steve Madsen Jun 01 '09 at 18:27

4 Answers4

1

I'd use the lower level open() and read() rather than the streams equivalent as otherwise you'll have to worry about synchronizing the stdio buffers with the underlying file descriptor. Note you'll still have issues reading complete numbers, so you'll probably need some sync between the processes.

As an alternative I would suggest a single process to read the file and write a subset of the lines to subprocesses that do the sorting (using pipe()), which they then write to another process doing the merge.

pixelbeat
  • 30,615
  • 9
  • 51
  • 60
0

You were working with linked channels.

from glibc 13.5.1 (emphasis is mine)

Channels that come from a single opening share the same file position; we call them linked channels. Linked channels result when you make a stream from a descriptor using fdopen, when you get a descriptor from a stream with fileno, when you copy a descriptor with dup or dup2, and when descriptors are inherited during fork.

Apparently, you can not do I/O from both the streams simultaneously.

If you have been using a stream for I/O (or have just opened the stream), and you want to do I/O using another channel (either a stream or a descriptor) that is linked to it, you must first clean up the stream that you have been using.

Raman
  • 2,735
  • 1
  • 26
  • 46
0

If you're using fscanf, the only thing you can do is have each process read and discard numbers until it gets to those that it should work on. In your case discard i*partialdatasize numbers.

So e.g. 5 7 3 1 4 8 5 10 2 you might have 5 7 3

1 4 8

5 10 2

which would sort to give

3 5 7

1 4 8

2 5 10.

Then you have to work out how to merge the sorted results.

pjc50
  • 1,856
  • 16
  • 18
0

If you can store your integers as binary. You can have the first thread read it's block

fread(&arrayPartialData[j], sizeof(int), partialDataSize, fp);

Than the 2nd thread can skip the block which has already been read (Because you know the size of each block). Then you can begin reading from there, without needing to discard any data.

fseek(partialDataSize * threadNumber);

I also recommend you use threads, as forking is very expensive. threads tutorial