-3

I have a large file, and I want to be able to check if a word is present twice.

puts "Enter a word: "
$word = gets.chomp

if File.read('worldcountry.txt') # do something if the word entered is present twice...

How can i check if the file worldcountry.txt include twice the $word i entered ?

carl Bloid
  • 11
  • 3
  • Of course you're right, i have to use `File.read` but what to do after ? use `include?` `any?` how to detect the word twice ? – carl Bloid Oct 16 '22 at 14:12
  • But that is not what you asked. The only question you ask is "do I have to use File.read?" – matt Oct 16 '22 at 14:16
  • As for how to do it, that's kind of broad. Have you looked at Ruby's ways of searching a string? This is one of Ruby's strongest features. Show us your existing attempt and explain the difficulty for you. – matt Oct 16 '22 at 14:18
  • I edited my question to be more clear.Yes i looked some ways to do that, i know how to find a string with `include?` or `any?` or `grep` but i don't know how to check if word is present twice. – carl Bloid Oct 16 '22 at 14:26
  • @carlBloid _"do I have to use `File.read?`"_ – no you don't. For (very) large files or if your memory is limited, it might be better to use `IO.foreach` or `IO#each_line` which reads the file line by line. In addition, you could stop reading once you found the 2nd occurrence of your word (if you mean _at least_ twice) or the 3rd (if you mean _exactly_ twice) by `break`-ing out of the `foreach` / `each_line` loop. – Stefan Oct 16 '22 at 17:34
  • At least twice or exactly twice? How large is the file? – Cary Swoveland Oct 17 '22 at 00:20

2 Answers2

1

I found what i needed from this: count-the-frequency-of-a-given-word-in-text-file-in-ruby

On the Gerry post with this code

word_count = 0
my_word = "input"

File.open("texte.txt", "r") do |f|
  f.each_line do |line|
    line.split(' ').each do |word|
      word_count += 1 if word == my_word
    end
  end
end

puts "\n" + word_count.to_s

Thanks, i will pay more attention next time.

carl Bloid
  • 11
  • 3
  • Assuming you have an array of words in the file, you might just check `words.tally[my_word] == 2`. – Chris Oct 16 '22 at 15:33
0

If the file is not overly large, it can be gulped into a string. Suppose:

str = File.read('cat')
  #=> "There was a dog 'Henry' who\nwas pals with a dog 'Buck' and\na dog 'Sal'."
puts str
There was a dog 'Henry' who
was pals with a dog 'Buck' and
a dog 'Sal'.

Suppose the given word is 'dog'.

Confirm the file contains at least two instances of the given word

One can attempt to match the regular expression

r1 = /\bdog\b.*\bdog\b/m
str.match?(r1)
  #=> true

Demo

Confirm the file contains exactly two instances of the given word

Using a regular expression to determine is the file contains exactly two instances of the the given word is somewhat more complex. Let

r2 = /\A(?:(?:.(?!\bdog\b))*\bdog\b){2}(?!.*\bdog\b)/m
str.match?(r1)
  #=> false

Demo


The two regular expressions can be written in free-spacing mode to make them self-documenting.

r1 = /
     \bdog\b       # match 'dog' surrounded by word breaks  
     .*            # match zero or more characters
     \bdog\b       # match 'dog' surrounded by word breaks
     /m            # cause . to match newlines
r2 = /
     \A            # match beginning of string
     (?:           # begin non-capture group
       (?:         # begin non-capture group
         .         # match one character
         (?!       # begin negative lookahead
           \bdog\b # match 'dog' surrounded by word breaks
         )         # end negative lookahead
       )           # end non-capture group
       *           # execute preceding non-capture group zero or more times
       \bdog\b     # match 'dog' surrounded by word breaks
     )             # end non-capture group
     {2}           # execute preceding non-capture group twice
     (?!           # begin negative lookahead
       .*          # match zero or more characters
       \bdog\b     # match 'dog' surrounded by word breaks
     )             # end negative lookahead
     /xm           # # cause . to match newlines and invoke free-spacing mode
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100