0

I am trying to extract information from a portion of a table in R. Example table below...

enter image description here

This is just a simple example compared to what I am really dealing with. I am working with a very large table that has a very strange structure and changes with each page. When I read the whole table using "extract_tables" function, I get a very unstructured result back with multiple table elements being pushed into the same row/column. So I am attempting to read only a portion of the table. I am trying to locate the position of the table using the text in the first cell "Here", so I can plug this into the "area" parameter of the "extract_tables" function. I cannot use the "extract_areas" function because I do not want to extract the tables manually.

Can anyone help me with this?

AyeTown
  • 831
  • 1
  • 5
  • 20
  • I'm assuming you're talking about the `tabulizer` library? You might find the `locate_areas` function helpful for defining a custom page area to extract data from. In my experience with `tabulizer`, a typical workflow is to first extract the raw data from your PDFs, and then apply some functions to clean and transform the data into a usable data set. This library does a good job of capturing general table features, but you often still need to separate and exclude things after you run `extract_tables` – Mako212 Mar 09 '20 at 16:24
  • The locate_areas function is useful, however, you have to drag the rectangle to cover the area you want manualy... I want to find the start position (top left corner) based on a word ("Here" in this example) within the table. The reason I can't define it manually is because the location/size of the table changes with each page. – AyeTown Mar 09 '20 at 16:54
  • Why not use Regex to search for the relevant text after extracting the PDFs, then remove the unnecessary rows? – Mako212 Mar 09 '20 at 17:06
  • I need the data in a tabular format.. see my other question here which explains my use-case further (if you're bothered)... https://stackoverflow.com/questions/60571187/extracting-text-from-a-table-in-r – AyeTown Mar 09 '20 at 17:10

0 Answers0