5

In ruby 1.9.3, I'm trying to write a program that will find all words with n number of characters taken from an arbitrary set of characters. So for instance, if I'm given the characters [ b, a, h, s, v, i, e, y, k, s, a ] and n = 5, I need to find all 5-letter words that can be made using only those characters. Using the 2of4brif.txt word list from http://wordlist.sourceforge.net/ (to include British words and spellings, too), I have attempted the following code:

a = %w[b a h s v i e y k s a]
a.permutation(5).map(&:join).each do |x|
  File.open('2of4brif.txt').each_line do |line|
    puts line if line.match(/^[#{x}]+$/)
  end
end

This does nothing (no error message, no output, as if frozen). I have also attempted variations based on the following threads:

What's the best way to search for a string in a file?

Ruby find string in file and print result

How to search for exact matching string in a text file using Ruby?

Finding lines in a text file matching a regular expression

Match a content with regexp in a file?

How to open a file and search for a word?

Every variation I have tried has resulted in either:

1) Freezing;

2) Printing all words from the list that contain the 5-character permutations (I assume that's what it's doing; I didn't go through and check all of the thousands of printed words); or

3) Printing all 5-character permutations found within words in the list (again, I assume that's what it's doing).

Again, I'm not looking for words that contain the 5-character permutations, I'm looking for 5-character permutations that are complete words in and of themselves, so a line in the text file should only be printed if it is a perfect match with a permutation.

What am I doing wrong? Thanks in advance!

Community
  • 1
  • 1
grandinero
  • 1,155
  • 11
  • 18
  • 2
    And that's a well constructed question. – MurifoX Feb 20 '13 at 14:30
  • 2
    I am guessing it freezes because for each 5 letter combination you are reading each line of a huge file. At least I assume a word list of the English language is pretty big. It's unclear to me which file you're actually using. Anyway, what you're trying to do just takes a lot of resources (time and memory) and that's why the program freezes. – Mischa Feb 20 '13 at 14:39
  • `a.permutation(5)` results in 55440 possible five letter words. I don't how many words there are in your dictionary, but let's say 100000 (that's a very low estimate). This will result in an iteration that runs over 5 billion(!) times. – Mischa Feb 20 '13 at 14:48

4 Answers4

3

You’re not really using regular expressions here. Your program is very inefficient, not only because you’re re-opening the file for each single permutation as has been pointed out (and there are 55k of them!); but above all because all you want to do is

/^[bahsvieyksa]{5}$/

for each line of the file.

I would thus suggest:

File.open('2of4brif.txt').each_line do |line|
  puts line if line.match(/^[bahsvieyksa]{5}$/)
end

as a much more efficient alternative

Arthur Reutenauer
  • 2,622
  • 1
  • 17
  • 15
  • In your second code block, you omitted {5} from the regexp. And when I try this code, it doesn't freeze but prints nothing. – grandinero Feb 20 '13 at 14:58
  • Okay, I figured out why it wasn't coming up with any output: it needs `line.chomp!`. But even then, it's not doing the right thing. For example, it's coming up with "savvy" even though there's only one v in the character set. I don't know a lot about regular expressions, so maybe you can show me a way to fix that in your code. – grandinero Feb 20 '13 at 15:02
  • Yes. I first wrote a long comment addressing that because your own comment wasn’t showing up on my screen. To sum up, I’ll just say my code was doing what you said you wanted, not what you actually wanted :-) Dave’s answer above is actually what you want. – Arthur Reutenauer Feb 20 '13 at 15:14
1

This works for me using the english.0 file on that page (sorry, I couldn't find the specific file you mentioned):

a = %w[b a h s v i e y k s a l d n]
dict = {}
a.permutation(5).each do |p|
  dict[p.join('')] = true
end

File.open('english.0').each_line do |line|
  line.chomp!.downcase!
  puts line if dict[line]
end

The structure should be pretty clear - I build the dictionary of permutations up front in one giant hash (you may need to revisit this depending on input sizes, but memory is cheap these days), and then I used the fact that the input was "one word per line" to simply key into that hash.

Also note, in my version, I read through the file only once. In yours you scan the file once per permutation, and there are thousands of permutations.

Dave S.
  • 6,349
  • 31
  • 33
0

Simpler is to just count the occurrence of each char and compare:

a = %w[b a h s v i e y k s a l d n]
File.read('2of4brif.txt').split("\n").each do |line|
  puts line if line.size == 5 && line.chars.all?{|x| line.count(x) <= a.count(x)}
end
pguardiario
  • 53,827
  • 19
  • 119
  • 159
0

For me the following worked out

File.open('file.txt').each_line do |line|
  puts line if line[/<regexp>/]
end