scrape data from PDF and save it to mysql database

Question

Anybody suggest me the idea of scraping the data from PDF file and save it to MySql database using PHP or any other tool.

Actually, I am creating a script which will read the plain-text content (Convert pdf content to Plain text using apache-tika tool), and save it to the database. But this is a very lengthy process and not accurate.

So, please suggest me any other approach to complete this task.

Can you please show some code. `not accurate` What do you mean ? http://stackoverflow.com/help/how-to-ask — Pogrindis, Jun 14 '16 at 11:07
Like, If we want to scrap the "introduction" and "Job title" of a person from the PDF content. So we will find these heading in the content, But there is also some possibility to come to these headings in the content of "introduction" or "job description". That why I was saying this will not accurate. — Ajai, Jun 14 '16 at 11:15

score 2 · Answer 1 · edited Jun 09 '21 at 16:48

2

you can do one thing, if you want to scrape 1 or 2 pdf you can conver pdf to html using any online tool, then by using simplehtmlDom library you can scrape the data. you can use PDF Text Extractor to extarct the text from pdf.

i hope it will helpfull to you

edited Jun 09 '21 at 16:48

DisappointedByUnaccountableMod

6,656
4
18
22

answered Jun 14 '16 at 12:36

aniket ashtekar

290
1
11

No dude, I have millions of PDF file. – Ajai Jun 14 '16 at 12:38
you can try PDF Text Extractor class – aniket ashtekar Jun 14 '16 at 13:11

scrape data from PDF and save it to mysql database

1 Answers1