2

I'm writing a small program to request a chunk of a file, and then have another program return that specific chunk of the file. I can get this to work using files up to about 555000 bytes, but on anything lager than that, I get unusual behavior.

In my loop, I check a progress buffer, which is an array of integers, to see whether or not I have a specific chunk of the file. If I do, then I don't send a request for that chunk, but if I don't, then I request the chunk from a peer. I have a linked list that I move over, where each peer in the list has a sockfd associated with that. So, in a sense, I "round robin" the requests out. Once I send them out, I wait for messages to come in. I don't know if this is the best approach, but it seemed to be the most natural.

However, for files that are larger, it hangs at the select call. I'm not entirely sure why. If anyone can shed some light on this, or suggest a better approach, I'd love to hear it.

Here is my code. I didn't include the code for the functions that I call, but I don't think its necessary in this instance (they are modified versions of Beej's sendall and recvall functions). It is also worth mentioning that this is in a multithread application (using pthreads), but I'm not using any shared variables. Thanks for taking the time to read this!

total = (number of chunks in the file)
my_peers = (linked list of peer structs containing sockfds, etc)
int i;
progress_buf = (array of ints, each representing a chunk in a file, 0 means we don't have the chunk)

while (1) {

    /* Send out the chunk requests */
    for(z= 0; z < total; z++) {

        /* Circles through the peer list */
        if (temp -> next != NULL) {
            temp_next = temp->next;
        } else {
            temp_next = my_peers;
        }

        if (progress_buf[z] == 0) { 
            /* Request_blocks performs a send */ 

            /* Need to deal with the "hanging bytes" chunk */       
            if (((z + 1) == total) && (remainder_chunk == 1)) {           
                check = request_blocks(temp, remainder, remainder_chunk, z);
            } else {
                check = request_blocks(temp, remainder, 0, z);
            }

            /* Bad send, remove peer from file descriptors and list */
            if (check != 0 ) {          
                FD_CLR(check, &masterr);
                remove_peer(&my_peers, temp->socket);
            }
        }     
        temp = temp_next;
    }

    read_fdss = masterr; 

    /* HANGS RIGHT HERE */
    if (select(fdmax+1, &read_fdss, NULL, NULL, NULL) < 0) {
        perror("select"); 
    }

    read_fdss = masterr;

    int got_block;
    /* Means we've received a block */
    for(i = 4; i <= fdmax; i++) {
        got_block = -1;
        if (FD_ISSET(i, &read_fdss)) {

            /* Performs a recv */
            got_block = receive_block(i, filename_copy, progress_buf);

            /* Update the progress buffer */
            if (got_block > -1)  {
                remaining_blocks++;
                if (remaining_blocks == total) goto finished;
                    /* Failure- remove the peer */  
                } else if (got_block == -2) {
                    close(i);
                    FD_CLR(i, &masterr);
                    remove_peer(&my_peers, i);
                }
            }
        }
    }
}
rtruszk
  • 3,902
  • 13
  • 36
  • 53
the_man_slim
  • 1,155
  • 2
  • 11
  • 18
  • Hmm, are you sure the `read_fdss = masterr` assignment *after* the **select()** call is what you want? Then of course you will run **receive_block()** for each fd you are interested in, not only those with data ready. Other than that, I recommend that you `strace` your program (use `strace -vf` to trace multi-threaded application) and check exactly what **recv()**s and **select()**s are you calling with which arguments. – Petr Baudis Sep 30 '12 at 12:13
  • Hmmm... I don't think the assignment after the select call is what I want. I removed it, but nothing changed. I ran dtruss (using OSX here) and I was unable to see anything wrong. I'm stumped. – the_man_slim Sep 30 '12 at 18:24
  • In your "we've received a block" code, you're only reading one block from the FD -- can the peer have sent multiple blocks, and they're buffered somewhere? If the buffering code has already read multiple blocks out of the socket, `select()` won't report anything new to read. – Barmar Oct 04 '12 at 00:56
  • If select() isn't returning, the most obvious explanation would be that no data has been received on any of the sockets specified by read_fdss. Are you sure that your program actually is receiving more data at that point? (A second reason would be that read_fdss isn't specifying the correct socket set to watch, are you sure read_fdss and max_fd are set to the correct values?) – Jeremy Friesner Jan 02 '15 at 01:14
  • @Barmar select is level-triggered, not edge-triggered, so it will always return whenever there is at least one byte available to read on any socket specified (even if the program had a chance to read that byte in a previous iteration of the loop and chose not to) – Jeremy Friesner Jan 02 '15 at 01:16
  • @JeremyFriesner I know that. But if everything has been read and buffered, there won't be anything available to read, so select will block. – Barmar Jan 02 '15 at 01:21

0 Answers0