1

Hi i have found some video and text on how to do this but they dont help with this task. I know how to get one values but not extract a table.

I want this to get exported into a database if possible or a Excel. But i cant figure it out. I have even tryed change the "Change reading opption"

I tryed to "data scraping" but the program just say "This controler does not support data extraction" And it can't be more of a table then this.

enter image description here

I have heard that it cant be because the structure of the PDF is bad. Still isn't there more ways of doing this.

Jonas
  • 185
  • 4
  • 16
  • 1
    Did you already read this article? https://www.edureka.co/blog/uipath-pdf-data-extraction/ – kwoxer Jan 21 '21 at 19:20
  • @kwoxer I havent seen that but this is the thing i talked about data scraping dont work. And its not single target im looking for. I need it to understand im out for the hole table =) – Jonas Jan 21 '21 at 21:06
  • The program cant understand the PDF structure i think so i need an alternativ. – Jonas Jan 21 '21 at 21:39
  • Search for "Camelot" here on stackoverflow. It might be worth to give it a try: https://github.com/atlanhq/camelot – pyano Jan 22 '21 at 09:27
  • But I think he needs it for UiPath. – kwoxer Jan 22 '21 at 10:10

1 Answers1

0

Unfortunately, there is no activity in UiPath to read tables directly from PDFs. (As of today.) That was the bad news. The good news is that you can get to the contents of the PDF. Either you get the data (as flat text) directly with UiPath.PDF.Activities.ReadPDFText or you have to use OCR. @kwoxer provided a wonderful link for explanations on this topic. I have already been able to extract data from tables contained in a PDF document. At that time, I was lucky: ReadPDFText extracted everything. The table elements were separated by tabs ("\t"). And the table header contained a word that did not appear elsewhere in the document.

Just as an idea, I proceeded like this:

  1. Extract text from the PDF document with UiPath.PDF.Activities.ReadPDFText.
  2. Create an array, where the elements are the lines in the document. (Split using Environment.NewLine and option StringSplitOptions.RemoveEmptyEntries)
  3. Go through lines in a loop (ForEach) until the table header is found. (StartsWith or Contains etc.)
  4. The next row belongs to the table as long as it contains a tab. (Otherwise the table is over.)
  5. Split current row by tab and store it in an array: The elements of the array are the individual cells of the row.

I hope, this idea help.

primehunter
  • 280
  • 4
  • 10