1

I want to extract from pdf but pypdf2 doesn't extract all the information and textract was unable to install in 3.7 due to following error:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 1671: character maps to <undefined>

martineau
  • 119,623
  • 25
  • 170
  • 301

2 Answers2

1
  1. Download the source file for textract from: https://pypi.python.org/pypi/textract

  2. pip3 install pdfminer3k

  3. untar the downloaded file

  4. cd into the directory

  5. run: python3 setup.py install

Hope it works for you :)

martineau
  • 119,623
  • 25
  • 170
  • 301
Jaidude
  • 9
  • 2
  • I get "error: Setup script exited with error: command 'swig' failed with exit status 1". I cannot install swig – aless80 Aug 02 '19 at 17:00
0

I have installed textract on windows 10 with following steps: -

  1. pip install textract
  2. install poppler:
  3. Installation Complete
  4. Test by - import textract
  5. textract.process('path_to_file_with_extension')

For further reference, you can click here

Hope it will be helpful to you!

MK Singh
  • 1
  • 3