4

I'm doing a database for storing my eBook collection.
Most of them have the ISBN within the text of the book itself.
How can I access this contents?
Is there any sourcecode or DLLs for doing that?

InfoStatus
  • 6,983
  • 9
  • 41
  • 53

2 Answers2

4

I did it for eBook library app. First of all you need to extract text from chm or pdf file. There are a lot of utilities\libraries to do it. Here is an article on CodeProject on how to extract content from CHM files. For PDF files I used pdftotext utility. When you get plain text from eBook parse it using regular expression to find ISBN10/13 code.

aku
  • 122,288
  • 32
  • 173
  • 203
3

Extracting the text from CHM and PDF files is the first step. Next you can find the ISBN number with a regular expression.

Darin Dimitrov
  • 1,023,142
  • 271
  • 3,287
  • 2,928