0

I am converting pdfs to text and got this code off a previous post:

Extracting text from a PDF file using PDFMiner in python?

When I print(text) it has done exactly what I want, but then I need to save this to a text file, which is when I get the above error.

The code follows exactly the first answer on the linked question. Then I:

text = convert_pdf_to_txt("GMCA ECON.pdf")

file = open('GMCAECON.txt', 'w', 'utf-8')
file.write(text)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-ebc6b7708d93> in <module>
----> 1 file = open('GMCAECON.txt', 'w', 'utf-8')
  2 file.write(text)

TypeError: an integer is required (got type str)

I'm afraid it's probably something really simple but I can't figure it out. I want it to write the text to a text file with the same name, which I can then do further analysis on. Thanks.

Rachel9866
  • 121
  • 1
  • 11

2 Answers2

2

The problem is your third argument. Third positional argument accepted by open is buffering, not encoding.

Call open like this:

open('GMCAECON.txt', 'w', encoding='utf-8')

and your problem should go away.

matevzpoljanc
  • 211
  • 2
  • 11
1

when you do file = open('GMCAECON.txt', 'w', 'utf-8') you pass positional arguments to open(). Third argument you pass is encoding, however the third argument it expect is buffering. You need to pass encoding as keyword argument, e.g. file = open('GMCAECON.txt', 'w', encoding='utf-8')

Note that it's much better is to use with context manager

with open('GMCAECON.txt', 'w', encoding='utf-8') as f:
    f.write(text)
buran
  • 13,682
  • 10
  • 36
  • 61