0

I'm pretty new to Python,i am trying to use this code. Everything works if I have to convert a file with a few lines, but if I do it with files of 500 MB it crashes.

import re
import num2words

with open('num.txt') as f_input:
     text = f_input.read()
text = re.sub(r"(\d+)", lambda x : num2words.num2words(int(x.group(0))), text)
with open('word.txt','w') as f_output:
     f_output.write(text)

What can I do to make it go further than this? is it a memory problem and line reading?

mozway
  • 194,879
  • 13
  • 39
  • 75
Piero U.
  • 11
  • 1
  • 4
    Please [edit] your question to translate its title to English. – Mat Nov 09 '21 at 16:33
  • In your own words, what do you think `f_input.read()` does? Can you think of other ways to read and handle the file? "is it a memory problem" Did you try checking how much memory is used by Python when you run the code? – Karl Knechtel Nov 09 '21 at 16:41
  • Does https://stackoverflow.com/questions/28936140/use-readline-to-read-txt-file-python3 help? – Karl Knechtel Nov 09 '21 at 16:42

1 Answers1

1

You're currently reading the whole input, then process the whole thing. This requires to load everything in memory.

Instead, process line by line and save to the output file as you go:

import re
from num2words import num2words

with open('num.txt') as f_input, open('word.txt', 'w') as f_output:
    for line in f_input:
        line = re.sub(r"(\d+)", lambda x : num2words(int(x.group(0))), line)
        f_output.write(line)
mozway
  • 194,879
  • 13
  • 39
  • 75