I suppose in-memory file-like objects are expected to behave like files. I am not able to get Textract to "read" a
<StringIO.StringIO instance at 0x05039EB8>
although the program runs fine if I save the JPEG file to disk and read in the normal course.
The jpeg file is being extracted from pdfs, per Ned Batchelder's excellent blog Extracting JPGs from PDFs. Relevant code below:
type(jpg) --> str (on 2.7)
buff = StringIO.StringIO()
buff.write(jpg)
buff.seek(0)
type(buff) --> instance
print buff --><StringIO.StringIO instance at 0x05039EB8>
dt=Image.open(buff)
print dt --><PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=2630x597 at 0x58C2A90>
text=textract.process(dt)`
This line fails. Textract cannot read the JpegImageFile
If I do
text=textract.process(buff.getvalue())
I get an error: must be encoded string without NULL bytes, not str
How do I get Textract to read from the in-memory file or streams?