I am trying to convert pdf files to Image and then use pytesseract to ocr the files. I was able to do it successfully on the files which are present in the linux local path but not with hdfs path.
from wand.image import Image as wi
>>> wi(filename = 'hdfs://boboda02.boobo.com:8020/bda/clamsops/raw/personal_brella_test/09_29_2015_090902.pdf',resolution = 300)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/sam/my_env_1/lib/python2.7/site-packages/Wand-0.4.2-py2.7.egg/wand/image.py", line 2534, in __init__
File "/home/sam/my_env_1/lib/python2.7/site-packages/Wand-0.4.2-py2.7.egg/wand/image.py", line 2601, in read
File "/home/sam/my_env_1/lib/python2.7/site-packages/Wand-0.4.2-py2.7.egg/wand/resource.py", line 222, in raise_exception
wand.exceptions.MissingDelegateError: no decode delegate for this image format `//boboDA02.boobo.COM' @ error/constitute.c/ReadImage/501