-3

Anybody suggest me the idea of scraping the data from PDF file and save it to MySql database using PHP or any other tool.

Actually, I am creating a script which will read the plain-text content (Convert pdf content to Plain text using apache-tika tool), and save it to the database. But this is a very lengthy process and not accurate.

So, please suggest me any other approach to complete this task.

Ajai
  • 2,492
  • 1
  • 14
  • 23
  • http://www.pdfparser.org/ – Sibiraj PR Jun 14 '16 at 11:06
  • 1
    Can you please show some code. `not accurate` What do you mean ? http://stackoverflow.com/help/how-to-ask – Pogrindis Jun 14 '16 at 11:07
  • Like, If we want to scrap the "introduction" and "Job title" of a person from the PDF content. So we will find these heading in the content, But there is also some possibility to come to these headings in the content of "introduction" or "job description". That why I was saying this will not accurate. – Ajai Jun 14 '16 at 11:15

1 Answers1

2

you can do one thing, if you want to scrape 1 or 2 pdf you can conver pdf to html using any online tool, then by using simplehtmlDom library you can scrape the data. you can use PDF Text Extractor to extarct the text from pdf.

i hope it will helpfull to you

aniket ashtekar
  • 290
  • 1
  • 11