4

I would like to know how to crawl data inside a pdf file using scrapy. Which module should I use and which is the best and effective way?? Could you please give me some sample tutorials on this

Thanks!!

Dev Pandu
  • 121
  • 2
  • 12

1 Answers1

4

I suggest you get the PDF with Scrapy and use PyPDF2 to get the content inside the PDF.

For a complete but somewhat old (using pyPDF) example take a look at this site.

GHajba
  • 3,665
  • 5
  • 25
  • 35
  • Thank you for the answer.. I have tried to use the sample site you have given me but I am still getting some errors like *** PdfReadError: EOF marker not found – Dev Pandu Jul 08 '15 at 09:39