To simplify my problem, lets say I have a table with a lot of books and their respective content. On the other hand I have a keyword table. I would like to find the matching pairs. Please see the simple Perl script below which illustrates the problem nicely.
#title => content
%books = (
"Foodworld" => "Cheesburgers and Hamburgers are the best you can ...",
"Marvelous Salad" => "Russian dressing is superb when ...",
"Delicious Steaks" => "Only BBQ RipEye"
);
#id => keyword
%keywords = (
"1234" => "Cheeseburgers",
"2345" => "dressing",
"9789" => "Hamburgers"
);
while ( my ($title,$content) = each %books ) {
while ( my ($keywordID, $keyword) = each %keywords ) {
if ( $content =~ /$keyword/ ) {
print "$title \t $keywordID \n";
}
}
}
The output will be:
Marvelous Salad 2345
Foodworld 1234
Foodworld 9789
My problem is, that the collection of books contains ~70,000 titles and the list of keywords ~30,000 words. Both are in separate tables on a MySQL server. Any suggestions? How would you solve this task? Could you just point me in a good direction?