1

I'm using tesseract.js to get text from a W2 form. I'm having trouble trying to figure out how I could get match up the values of the form to the labels. Like How can I match up the label 'employee social security number' with the value social security number?

jackjoesmith
  • 951
  • 4
  • 20
  • 33
  • If you know the W2 image will always have the same layout, you can split the image up and run tesseract on each piece. That's what I did in one of my side-projects: https://github.com/SidneyNemzer/siege-stats/blob/master/src/components/App.js#L82-L151. You'll have to manually determine the coordinates of each field, of course. I just did that by trial-and-error. – Sidney Sep 13 '17 at 23:01
  • @Sidney Thanks for the response. Can you explain to me what the subcanvas and subContext is for ? – jackjoesmith Sep 14 '17 at 20:44
  • Are you familiar with [canvas](https://developer.mozilla.org/en-US/docs/Web/API/Canvas_API)? In my app, canvas is used to grab pieces of an image, then pass them to Tesseract. There's a large canvas (`scoreboardCanvas`) that holds the main image. The `subCanvas` is used to hold the sub-section of the image that's being worked on. It's reused for every piece (there's 10 'subsections' in my app). A `context`, in regards to canvas, is the object that can manipulate the pixels of the canvas. – Sidney Sep 14 '17 at 21:02
  • @Sidney Thanks! i took your advice and cut up the image with canvas. Then I scanned them all individually. – jackjoesmith Sep 18 '17 at 20:11

0 Answers0