0

this is my first question here, so although I'll try my best to ask the question correctly, please have patience with me. I'm trying to run an OCR with Tesseract with Django on my server at some server (pythonanywhere, if it's important in any way), but I keep having this error:

pytesseract.pytesseract.TesseractError: (1, 'Tesseract Open Source OCR Engine v3.04.01 with Leptonica
 Error opening data file /usr/share/tesseract-ocr/tessdata/heb.traineddata Please make sure the 
TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory. Failed 
loading language \'heb\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')

So, at first, I thought I could just move the correct "tessdata" file (which exists on my server) into /usr/share/bin... but I couldn't do that without a root user. no matter what I tried in the Bash shell, I don't have access to the root user (although I was never asked to implement one). I cannot use the "sudo" command that I see so often, I guess it's because it's not a valid command in Bash shell (or Unix, I'm not sure how to refer to it). I guess I have a root user named "Orikle", but no matter what, I couldn't manage to find a correct password (tried the pythonanywhere password for my account, and the Django superuser password (yeah, I know it was wishfull-thinking)).

After giving up on that method, I saw that the error mentioned that the TESSDATA_PREFIX environment variable can be set. so then I STFW and found out how to create shell and env variables and indeed I created them, but to no avail. when I enter the console and type printenv I can see TESSDATA_PREFIX=/home/Orikle/.virtualenvs/myenv/bin/Tesseract-OCR so that led me to believe that I really managed to make it work, but alas, I keep getting the same error as before.

Just to make clear, I tried the parent directory, I tried the exact directory, and maybe every other directory out there. Any help would be appreciated. Thanks.

Orikle
  • 46
  • 3
  • 1
    Hi Orikle, welcome to stackoverflow! The root user (called "root") can do things that could damage the system; based on your question, I recommend that you avoid doing anything that needs root permissions -- even if the server is yours and you are entitled to them. It's too easy to shoot yourself in the foot, and anyway that's the wrong solution here. Use environment variables, but keep in mind that (simplifying a bit) the environment can only be seen by processes started from a shell where the variable is set. I don't know your tool stack so I'll let somebody else answer with the specifics. – alexis Mar 04 '20 at 11:34
  • Thank you very much for the quick answer alexis, I'll keep that in mind. unfortunately, I'm not quite sure about which processes start from the shell and how can I change the process to, well, start from there. – Orikle Mar 04 '20 at 12:21
  • 2
    How do you actually start/run Django? Thing is, the environment variable must be set for the process running django. If you're on python anywhere, [here](https://help.pythonanywhere.com/pages/environment-variables-for-web-apps) is an example on how to correctly set an environment variable. – dirkgroten Mar 04 '20 at 13:32

1 Answers1

0

Thanks everyone. It's been a while, but I ended up replacing the heb.traineddata all together and then adding the TESSDATA_PREFIX variable. I wish I was more certain as to what was the problem, but at least I got it to work.

Orikle
  • 46
  • 3