3

I have been given receipts from Subway detailing sales, workers, etc throughout the day and need to extract the data for a management class.

I took pictures of the receipts and processed them with pytesseract into a string separated by \n but now don't know how to use pd.read_csv and StringIO to transform it into a dataframe. Don't if this is the best way to go about it. Also may need to edit the image using cv2 so that it processes better.

import numpy as np
import pytesseract
from PIL import Image
import pandas as pd

path = 'C:\\attachments\\'

monday = pytesseract.image_to_string(Image.open(path+'file1-1.jpeg'),lang='eng')

from StringIO import StringIO
mon = pd.read_csv(StringIO(monday),sep=r'\s',lineterminator=r'\n')
print(mon)

This is some of the variable monday currently.

"\nTIME HOURS :\nPERIOD SALES UNITS WORKED PROD SPLH\nZhan emmoo «Ct (iti ;:t‘«é‘«‘i CSD\n3A-4A $0.00 0 0 0 $0.00\n44-54 =: $0.00 SssOO 0 0 $0.00\n5A-6A $0.00 0 0 0 $0.00\nbA-7A $0.00 0 0 0 $0.00\n7A-BA =s«$0.00-Sss«OOs«*O0.80 0 $0.00\nBA-9A 60,00 . Qge2.00 0 $0.00\nQA-10A $33.68 6 2,00 3.00 $16.84\n104-114 $61.07 9 2.13 4.23 $28.67\n11A-12P$238.82 33 5,00 6.60 $47.76"

It should look like this as a dataframe:

Period Sales Units Worked Prod SPLH
3A-4A  $0.00  0      0     0   $0.00
bA-7A  $0.00  0      0     0   $0.00
N.Fisher
  • 154
  • 1
  • 9
  • 1
    "I have been given receipts from Subway" <— Please don't get this wrong, but couldn't you just ask them for a .xls or .csv file then, instead of something printed? – Asmus Apr 20 '19 at 15:05
  • I wanted to ask for that but didn't haha. They pulled it directly from the cash register and I doubt the UI there would have a send to csv and then email function – N.Fisher Apr 20 '19 at 18:01

1 Answers1

5

You may get the results from tesseract directly into a Pandas dataframe:

monday = pytesseract.image_to_data(Image.open(path+'file1-1.jpeg'),lang='eng', output_type='data.frame')

Now monday is a dataframe which, however, needs more processing from you, as it contains at least a row for each level in the hierarchy. Check the output and see how you wish to organize it.

kr1zz
  • 116
  • 2
  • 8