How to install textract in Python 3?

Question

I want to extract from pdf but pypdf2 doesn't extract all the information and textract was unable to install in 3.7 due to following error:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 1671: character maps to <undefined>

see here: https://stackoverflow.com/questions/50743723/cant-install-textract-on-windows — Matthijs990, Mar 24 '19 at 08:00

score 1 · Answer 1 · edited Mar 24 '19 at 10:01

1

Download the source file for textract from: https://pypi.python.org/pypi/textract
pip3 install pdfminer3k
untar the downloaded file
cd into the directory
run: python3 setup.py install

Hope it works for you :)

edited Mar 24 '19 at 10:01

martineau

answered Mar 24 '19 at 07:53

Jaidude

I get "error: Setup script exited with error: command 'swig' failed with exit status 1". I cannot install swig – aless80 Aug 02 '19 at 17:00

MK Singh · Answer 2 · 2019-11-18T10:38:32.027

I have installed textract on windows 10 with following steps: -

pip install textract
install poppler:
- Download archive - http://blog.alivate.com.au/wp-content/uploads/2018/10/poppler-0.68.0_x86.7z
- Extract it
- Paste complete folder in C:\Program Files
- Add C:\Program Files\poppler-0.68.0\bin to path variable
Installation Complete
Test by - import textract
textract.process('path_to_file_with_extension')

For further reference, you can click here

Hope it will be helpful to you!

2 Answers2