5

I have a C++ assignment where I have to devise a naive solution to the Knuth problem with the container that I have chosen and study the performance data that is generated. See problem below:

Three million men with distinct names were laid end-to-end, reaching from New York to California. Each participant was given a slip of paper on which he wrote down his own name and the name of the person immediately west of him in the line. The man at the extreme western end of the line didn’t understand what to do, so he threw his paper away; the remaining 2,999,999 slips of paper were put in a huge basket and taken to the National Archives in Washington, D.C. Here the contents of the basket were shuffled completely and transferred to magnetic tapes.

At this point an information scientist observed that there was enough information on the tapes to reconstruct the list of people in their original order. And a computer scientist discovered a way to to do the reconstruction with fewer than 1000 passes through the data tapes, using only sequential accessing of tapes and a small amount of random-access memory. How was this possible?

[In other words, given the pairs (xi, xi+1) for 1 ≤ i < N, in random order, where the xi are distinct, how can the sequence x1 x2….xN be obtained, restricting all operations to serial techniques, suitable for use on magnetic tapes. This is the problem of sorting into order when there is no easy way to to tell which of two given keys precedes the other;

From my research, I have decided to use an unordered_map, as opposed to a list or normal map. What I don't understand is the naive solution that has been provided to us to implement as code:

Consider the papers to be a collection of (Name, Name) tuples, both the successor (westerly neighbour) and the predecessor (easterly neighbour) can be established from these tuples.

 - identify an individual xc
   
 - append xc to empty list
   
 - while xc has westerly neighbour

 - xc < westerly neighbour of xc

 - append xc to list

 - xc < head of list
   
 - while xc has easterly neighbour

 - xc < easterly neighbour of xc

 - prepend xc to list

My first question - is xc just a random element, as the order cannot be determined because of the nature of the container?

My second question - the names that we have been given are in a file like so:

Hazbgaei,Ckwkkkxa
Hrunmkoc,Usjgmunt
Cmkcwncb,Ycrnwzjl
Oygvmrhf,Hylmukiw
Jursaual,Gzrddsbg

So is the naive solution saying that I should take the first name and put it in one list, then the last name and put that into a different list?

Apologies if I'm completely off but I have really tried to understand this!

Community
  • 1
  • 1
wilcode
  • 633
  • 2
  • 10
  • 25
  • IMHO in the paragraph starting with 'Consider the papers...' there are some commas or dots missing which are needed to really understand the solution. Can you please check? – Andreas Florath Jan 27 '13 at 19:26
  • @AndreasFlorath They were in a list but the question was edited.. – wilcode Jan 27 '13 at 19:34
  • @AndreasFlorath I've edited it back into a list now :) – wilcode Jan 27 '13 at 19:38
  • Looks better: nevertheless it seams that there are some information missing, like where does the block of statements after the while loop end or what does `xc < head of list` mean? – Andreas Florath Jan 27 '13 at 19:46
  • One first statement about a possible solution: of course it is possible to use a map here. Read the file (tape) into memory and use it for accessing all the data. IMHO the interesting aspect is, that everything should be solved using `serial techniques`, which are - as far as I understand - lists. – Andreas Florath Jan 27 '13 at 19:49
  • The c should be in a subscript format but I can't do that on here I don't think.. but I think "xc" stands for the random element and that statement possibly means move that random element to the top of the list.. – wilcode Jan 27 '13 at 19:49
  • As Andreas noted, it's still not completely clear what all the notation means. A tentative answer to the second question is that there is only one list and that the algorithm either appends or prepends elements to that list. – dhavenith Jan 27 '13 at 19:51
  • I agree it isn't clear.. that's why I'm not understanding it. – wilcode Jan 27 '13 at 19:53
  • @lauw0203: Can you please give me the exact source of the problem? Is it from one of the 'Art of Computer Programing' books? – Andreas Florath Jan 27 '13 at 21:29

2 Answers2

0

I believe the solution is saying to use one list. Pick the first tuple, and put its first name as the head of the list, and the westerly neighbor as the next item. Then, find the tuple that has that westerly neighbor as its first name (that, of course, is the hard part), and grab the westerly neighbor from that tuple and append it to the list. Repeat until you have can't find a tuple with the name of the last added person. Then you know you have reached the West Coast.

So, the first xc is essentially random. After that, xc is deterministic. The line that says

- while xc has westerly neighbour

is essentially saying "find that tuple that has the current westerly neighbor as the self name".

The latter part, of course, just applies the same logic in reverse, to fill up the chain going east.

BobS
  • 2,588
  • 1
  • 15
  • 15
0

First Try

I'm doing this as an answer, because it's impossible to ask this as a comment. Also clarifying this is IMHO at least half of the answer

Is this meant in the following way?

identify an individual xc
append xc to empty list
while( xc has westerly neighbour )
{
    if( xc < westerly neighbour of xc )
    {
        append xc to list
    }
    // Do not understand this:
    // - xc < head of list
}
while( while xc has easterly neighbour )
{
    if( xc < easterly neighbour of xc )
    {
        prepend xc to list
    }
}

(This is just a try to get closer to a solution - I even did not think about the algorithm yet...)

Second Try

identify an individual xc
append xc to empty list

while( xc has westerly neighbour )
{
    xc := westerly neighbour of xc
    append xc to list
    xc := head of list

    while( xc has easterly neighbour )
    {
        xc := easterly neighbour of xc
        prepend xc to list
    }
}

It seams that this makes some sense: So you run through the list, collect all the westerly neighbors as far as possible and do this with all the easterly after this. So for each run trough the original list, the sorted list gets longer and longer.

Third Try

IMHO something is missing: if I interpret all the information available I come to the following solution. But this needs a lot more scans of the original tape: instead of the given 10.000 times it scans it about 750.000 times. Any ideas?

#include <vector>
#include <algorithm>
#include <iostream>
#include <list>

size_t constexpr max_men { 3000000 };

typedef std::tuple< int, int > neighbors_t;
typedef std::vector< neighbors_t > pool_t;

int main() {

  // Need a vector here for random_shuffle - but using it further down
  // just with sequencial methods.
  pool_t pool_list;
  for( size_t i { 0 }; i < max_men - 1; ++i ) {
    pool_list.push_back( std::make_tuple( i, i + 1) );
  }
  std::random_shuffle( pool_list.begin(), pool_list.end() );


  // Use the algorithm to get it sorted again

  // identify an individual xc:
  // Pick first from first tuple
  // append xc to empty list
  std::list<int> sorted_list;
  sorted_list.push_back( std::get<0>( pool_list.front() ) );

  // Count the number of tape scans
  size_t tape_scans { 0 };

  do {
    // Scan through the pool_list
    for( neighbors_t n : pool_list ) {

#if 0
      std::cout << "n [" << std::get<0>( n ) 
        << "] sorted_list [";
      for( int i : sorted_list ) {
        std::cout << i << " ";
      }
      std::cout << "]" << std::endl;
#endif

      // while( xc has westerly neighbour )
      // Found westerly neighbour
      if( std::get< 1 >( n ) == sorted_list.front() ) {
        // append xc to list
        sorted_list.push_front( std::get< 0 >( n ) );
      }
      if( std::get< 0 >( n ) == sorted_list.back() ) {
        // append xc to list
        sorted_list.push_back( std::get< 1 >( n ) );
      }
    } 
    ++tape_scans;
    std::cout << "Tape Scans needed [" << tape_scans 
          << "] sorted list size [" << sorted_list.size() << "]"
          << std::endl;
  } while( sorted_list.size() < max_men );

  std::cout << "Tape Scans needed [" << tape_scans << "]" << std::endl;

  return 0;
}
Andreas Florath
  • 4,418
  • 22
  • 32
  • The "<" are meant to be arrows but I could not include the formatting, so I don't think they would appear in an if statement but the while loop's make sense.. – wilcode Jan 27 '13 at 19:59
  • I suppose the "<" are assignment. – dhavenith Jan 27 '13 at 20:00
  • furthermore, I think the second while should be inside the first, because otherwise the algorithm will terminate before dealing with all of the input. – dhavenith Jan 27 '13 at 20:01
  • @andreasflorath I've had a discussion with the tutor about my chosen container being an unordered_map because the largest file contains 5 million entries and he completely agreed that this should be used. Do you think list is used figuratively? – wilcode Jan 27 '13 at 20:20
  • @lauw0203: IMHO using a map is not what is described in the question. Of course you can do it. It's mostly straight-forward. – Andreas Florath Jan 27 '13 at 21:02
  • @AndreasFlorath Just seen your "Third Try". I've spoken to a fellow student and how BobS has interpreted the question is correct, but when you have the last tuple, the container needs to be searched again and prepended to the list. He has used an unordered map to load the names and search then a seperate list to insert the names in order. – wilcode Jan 27 '13 at 21:10
  • @lauw0203: This is your decision. It's just that the question tells another story: `And a computer scientist discovered a way to to do the reconstruction with fewer than 1000 passes through the data tapes, using only sequential accessing of tapes and a small amount of random-access memory.` and `restricting all operations to serial techniques, suitable for use on magnetic tapes.` `map` is no serial technique and this `map` uses more than a `small amount` of memory. – Andreas Florath Jan 27 '13 at 21:24
  • We're meant to implement the naive solution first, so it works but isn't great and then we're meant to implement Knuth's solution which uses less memory. – wilcode Jan 27 '13 at 21:58