Create a Research Database

Question

I would like to create a research database where I can store and retrieve articles (PDF files). Any suggestions?

I have looked at several relational database tutorials and none of them reference the storage and retrieval of documents, only raw data.

How many PDF documents do you plan to store? How much total data do you expect? You might want to use a NoSQL database like Mongo or MarkLogic. — Tim Biegeleisen, Sep 15 '15 at 01:33
Kim, is there a metadata associated with those PDF files? Is it a local or remote database? — Zepplock, Sep 15 '15 at 01:43
For most SQL engines, the document would be a blob (binary large object), just a container of untyped data, like a file. — JVene, Sep 15 '15 at 01:44
@Zepplock For each article I need to include: APA reference information and an annotated bibliography. I would like a search feature so some sort of metadata will also have to be attached to the file. It also needs to be able to accommodate between 100-150 articles. There is no database yet. I'm trying to figure out what's the best way to build it. — Kim, Sep 15 '15 at 01:49

score 1 · Answer 1 · answered Sep 15 '15 at 03:12

I would consider using something like Elasticsearch, Solr or Lucene, instead of traditional database approach. You can index, search and access metadata.

Here is Elasticsearch way via attachment plugin: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-attachment-type.html and how to example:
http://www.hashcode.eti.br/?p=420

Solr:
https://gist.github.com/nichtich/429904

and Lucene:
https://wiki.apache.org/lucene-java/LuceneFAQ#How_can_I_index_PDF_documents.3F

Thank you. This is helpful. The more I learned about databases the more I began to doubt that it solve my problem. — Kim, Sep 15 '15 at 12:04

score 0 · Answer 2 · answered Sep 15 '15 at 01:57

0

To build it on your local computer: put all files in one folder/directory, name them uniquely. Use any database (Postgres, mysql, sqlite, mongo, etc) to store metadata and reference PDF file by name. Even if you put it into the database - there's nothing you can do with it.

To build it on the internet, the same but use something like Amazon S3 to store PDF files. You might decide to build a web UI for it if you envision other people collaborating with you: for example adding or rating articles.

answered Sep 15 '15 at 01:57

Zepplock

28,655
4
35
50

what do you mean by 'Even if you put it into the database - there's nothing you can do with it.'? – Kim Sep 15 '15 at 02:00
The reason to use a database is that you can filter, order, group, even do calculations on database fields. With PDF files you can't do all that. – Zepplock Sep 15 '15 at 03:37
Thank you. This is what I thought. I couldn't visualize how it would work. – Kim Sep 15 '15 at 12:05

Create a Research Database

2 Answers2