2

I would like to create a research database where I can store and retrieve articles (PDF files). Any suggestions?

I have looked at several relational database tutorials and none of them reference the storage and retrieval of documents, only raw data.

Kim
  • 297
  • 1
  • 5
  • 15
  • How many PDF documents do you plan to store? How much total data do you expect? You might want to use a NoSQL database like Mongo or MarkLogic. – Tim Biegeleisen Sep 15 '15 at 01:33
  • How's NoSQL going to help store binary PDF documents? – Zepplock Sep 15 '15 at 01:42
  • Kim, is there a metadata associated with those PDF files? Is it a local or remote database? – Zepplock Sep 15 '15 at 01:43
  • For most SQL engines, the document would be a blob (binary large object), just a container of untyped data, like a file. – JVene Sep 15 '15 at 01:44
  • @Zepplock For each article I need to include: APA reference information and an annotated bibliography. I would like a search feature so some sort of metadata will also have to be attached to the file. It also needs to be able to accommodate between 100-150 articles. There is no database yet. I'm trying to figure out what's the best way to build it. – Kim Sep 15 '15 at 01:49

2 Answers2

1

I would consider using something like Elasticsearch, Solr or Lucene, instead of traditional database approach. You can index, search and access metadata.

Here is Elasticsearch way via attachment plugin: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-attachment-type.html and how to example:
http://www.hashcode.eti.br/?p=420

Solr:
https://gist.github.com/nichtich/429904

and Lucene:
https://wiki.apache.org/lucene-java/LuceneFAQ#How_can_I_index_PDF_documents.3F

Edmon
  • 4,752
  • 4
  • 32
  • 42
  • 1
    Thank you. This is helpful. The more I learned about databases the more I began to doubt that it solve my problem. – Kim Sep 15 '15 at 12:04
0

To build it on your local computer: put all files in one folder/directory, name them uniquely. Use any database (Postgres, mysql, sqlite, mongo, etc) to store metadata and reference PDF file by name. Even if you put it into the database - there's nothing you can do with it.

To build it on the internet, the same but use something like Amazon S3 to store PDF files. You might decide to build a web UI for it if you envision other people collaborating with you: for example adding or rating articles.

Zepplock
  • 28,655
  • 4
  • 35
  • 50
  • what do you mean by 'Even if you put it into the database - there's nothing you can do with it.'? – Kim Sep 15 '15 at 02:00
  • The reason to use a database is that you can filter, order, group, even do calculations on database fields. With PDF files you can't do all that. – Zepplock Sep 15 '15 at 03:37
  • Thank you. This is what I thought. I couldn't visualize how it would work. – Kim Sep 15 '15 at 12:05