-1

I would like to read a text file word by word when I need it. Like the ifstream in C++. I mean, I want to open the file, then I read the next word from it when I need it, and then close it. How do I do that?

WhoCares
  • 225
  • 1
  • 5
  • 16

1 Answers1

0

You can write a generator function that'll—

  • Read the contents of the file as lines.
  • Find and save all the words in an iterator.
  • Yield words from the iterator one by one.

Consider this file foo.txt:

This is an example of speech synthesis in English.
This is an example of speech synthesis in Bangla.

The following code returns the words one by one. However, it still reads the entire file at once and not word by word. That's because you'll have to track the cursor position line by line and then word by word. This can become even more expensive than reading the entire file at once or reading it chunk by chunk.

# In < Python3.9 import Generator from the 'typing' module.
from collections.abc import Generator


def word_reader(file_path: str) -> Generator[str, None, None]:
    """Read a file from the file path and return a
    generator that returns the contents of the file
    as words.

    Parameters
    ----------
    file_path : str
        Path of the file.

    Yields
    -------
    Generator[str, None, None]
        Yield words one by one.

    """
    with open(file_path, "r") as f:
        # Read the entire file as lines. This returns a generator.
        r = f.readlines()

        # Aggregate all the words from all the sentences in another generator.
        words = (word for sentence in r for word in sentence.split(" ") if word)

        # This basically means: 'for word in words; yield word'.
        yield from words


if __name__ == "__main__":
    wr = word_reader("./foo.txt")
    for word in wr:
        # Doing some processing on the final words on a line.
        if word.endswith(".\n"):
            word = word.replace(".\n", "")
        print(word)

This prints:

This
is
an
example
of
speech
synthesis
in
English
...

You can read the file chunk by chunk and then call this function to yield the words one by one.

Redowan Delowar
  • 1,580
  • 1
  • 14
  • 36