4

I'm a beginner in C++ and I'm trying to understand a Soundex algorithm I found somewhere on the internet. I understand most of it but this was not explained just posted somewhere so there are a few lines of code that I don't quite get.

The algorithm for a Soundex implemented in the code below is this: http://www.blackwasp.co.uk/soundex.aspx

And here is the code:

#include <algorithm>
#include <functional>
#include <string>
#include <cctype>

using namespace std;

//----------------------------------------------------------------------------
char f_transform( char c )
  {
  string consonants[ 6 ] = { "BFPV", "CGJKQSXZ", "DT", "L", "MN", "R" };
  for (int i = 0; i < 6; i++)
    if (consonants[ i ].find( c ) != string::npos)
      return (i +1) +'0';
  return c;
  }

//----------------------------------------------------------------------------
string soundex( const string& s )
  {
  string result;

  // Validate s
  if (std::find_if(
        s.begin(),
        s.end(),
        std::not1(std::ptr_fun<int,int>(std::isalpha))
        )
      != s.end())
    return result;

  // result <-- uppercase( s )
  result.resize( s.length() );
  std::transform(
    s.begin(),
    s.end(),
    result.begin(),
    std::ptr_fun<int,int>(std::toupper)
    );

  // Convert Soundex letters to codes
  std::transform(
    result.begin() +1,
    result.end(),
    result.begin() +1,
    f_transform
    );

  // Collapse adjacent identical digits
  result.erase(
    std::unique(
      result.begin() +1,
      result.end()
      ),
    result.end()
    );

  // Remove all non-digits following the first letter
  result.erase(
    std::remove_if(
      result.begin() +1,
      result.end(),
      std::not1(std::ptr_fun<int,int>(std::isdigit))
      ),
      result.end()
    );

  result += "000";
  result.resize( 4 );

  return result;
  }

// end soundex.cpp 

I get most of it except two things: 1. where the 'validating of string s' takes place:

if (std::find_if(
        s.begin(),
        s.end(),
        std::not1(std::ptr_fun<int,int>(std::isalpha))
        )
      != s.end())
    return result;

I do not understand the 'ptr_fun' very well. I have read about it on google and it's supposed to take a pointer to a function and return a function object. Now I'm guessing it's needed there because 's.begin()' is a function that returns an iterator which is like a pointer to an element in the vector at a certain index. So we could not just pass the function to 'isalpha' and we needed to convert it somehow. However, it is not entirely clear to me so please 'dumb it down' for me if you can so I can understand it better :).

  1. Another thing which I don't understand is along these lines of code:

    // Collapse adjacent identical digits result.erase( std::unique( result.begin() +1, result.end() ), result.end() );

    // Remove all non-digits following the first letter result.erase( std::remove_if( result.begin() +1, result.end(), std::not1(std::ptr_fun(std::isdigit)) ), result.end() );

Let's take the first part for example since explaining one will clear the other question also. I've done some googling and found out that 'unique' is supposed to 'remove all but the first in an array of consecutive equivalent characters'. And 'erase' is supposed to erase the elements in the range [first, last) which means the last element remains there and is not deleted. So if the author of the code wanted to remove all adjacent identical digits why didn't he just use something like this:

std::unique(result.begin() + 1, result.end() ); ??

Why also use the 'erase' function? This is how I interpret that code:

  • if I pass 'abaace' in there the 'unique' function would return 'abace' and then we would have erased('abace', 'e') and then I do not know what should happen here. Please explain if you can. Thank you for reading.
Dinal Koyani
  • 455
  • 3
  • 6
Cantaff0rd
  • 705
  • 1
  • 6
  • 14

0 Answers0