I am trying to extract one table each from 31 pdfs. The titles of the tables all start the same way but the end varies by region.
For one document the title is "Table 13.1: Total Number of Households Engaged in Agriculture by District, Rural and Urban Residence During 2011/12 Agriculture Year; Arusha Region, 2012 Census". Another would be "Table 13.1: Total Number of Households Engaged in Agriculture by District, Rural and Urban Residence During 2011/12 Agriculture Year; Dodoma Region, 2012 Census."
I used tabulizer to scrape the first table manually based on the specific text lines I need but given the similar naming conventions, I was hoping to automate this process.
```
PATH2<- "Regions/02. Arusha Regional Profile.pdf"
```
txt2 <- pdf_text(PATH2) %>%
readr:: read_lines()
```
specific_lines2<- txt2[4621:4639] %>%
str_squish() %>%
str_replace_all(",","") %>%
strsplit(split = " ")