If you are looking for words that have only a one-character difference, there are a couple of tricks you can use. First, to compare two words and count the number of characters different, you use this:
( $word1 ^ $word2 ) =~ tr/\0//c
This does a stringwise exclusive or on the two words; wherever the characters are the same, a "\0" will result; where they are not the same, a non-"\0" will result. tr, in its complement counting mode, counts the differences.
Second, noting that either the first half or the last half of the word must match exactly, partition the words into a hash by their first and last halves, reducing the number of other words you need to check a given word against.
This approach should only two or three times the memory of all the strings (plus a little overhead); it could be reduced to one to two times the memory by pushing \$word
and using $$_
in the grep and sort map $$_, @match in the output at the cost of some speed.
If the words are all the same length, the top level of the hash can be removed and two different hashes used for the words' beginnings and endings.
use strict;
use warnings;
use autodie;
my %strings;
my $filename = shift or die "no filename provided\n";
open my $fh, '<', $filename;
while (my $word = readline $fh) {
chomp $word;
push @{ $strings{ 'b' . length $word }{ substr($word, 0, length($word)/2)} }, $word;
push @{ $strings{ 'e' . length $word }{ substr($word, length($word)/2)} }, $word;
}
seek $fh, 0, 0;
while (my $word = readline $fh) {
chomp $word;
my @match = grep 1 == ($word ^ $_) =~ tr/\0//c, @{ $strings{ 'b' . length $word }{ substr($word, 0, length($word)/2) } }, @{ $strings{ 'e' . length $word }{ substr($word, length($word)/2) } };
if (@match) {
print "$word - " . join( ' ', sort @match ) . "\n";
}
else {
print "$word\n";
}
}
Note that this only looks for substitutions, not insertions, deletions, or transpositions.