Create a test file
Let's first create a file to work with.
text =<<-BITTER_END
It was the best of times, it was the worst of times, it was the age of wisdom,
it was the age of foolishness, it was the epoch of belief, it was the epoch of
incredulity, it was the season of Light, it was the season of Darkness, it was
the spring of hope, it was the winter of despair, we had everything before us,
we had nothing before us...
BITTER_END
FName = 'texte.txt'
File.write(FName, text)
#=> 344
Specify the word to be counted
target = 'the'
Create a regular expression
r = /\b#{target}\b/i
#=> /\bthe\b/i
The word breaks \b
are used to ensure that, for example, 'anthem'
is not counted as 'the'
.
Gulp small files
If, as here, the file is not humongous, you can gulp it:
File.read("texte.txt").scan(r).count
#=> 10
Read large files line-by-line
If the file is so large that we'd want to read it line-by-line, do the following.
File.foreach(FName).reduce(0) { |cnt, line| cnt + line.scan(r).count }
#=> 10
or
File.foreach(FName).sum { |line| line.scan(r).count }
#=> 10
mindful that Enumerable#sum made its debut in Ruby v2.4.
See IO::read and IO::foreach. (IO.methodx...
is commonly written File.methodx...
. This is permitted because File
is a subclass of IO
; i.e., File < IO #=> true
.)
Use gsub to avoid the creation of a temporary array
The first method (gulping the file) creates a temporary array:
["the", "the", "the", "the", "the", "the", "the", "the", "the", "the"]
to which count
(aka size
) is applied. One way to avoid the creation of this array is to use String#gsub rather than String#scan, as the former, when used without a block, returns an enumerator:
File.read("texte.txt").gsub(r).count
#=> 10
This could be used for each line of the file as well.
This is an unconventional, but sometimes helpful, use of gsub
.