-2

I have decided to develop an EDMS in Java as an end of year project for my last year in IT studies and i'm currently researching into database solutions for uploading and storing files with different formats as well as their metadata. I would like to be able to query file metadata and file content (I.E : return all documents created after june 2012, by the user John and that contain the string "finance").

I understand that Databases are for data and File Systems are for files as explained in this article, but some of my teachers have suggested that I look into XML databases, Apache Cocoon or Apache Jackrabbit and I have to admit that I am at a loss as to which approach I should take. This article seems to suggest that MongoDB would be my best bet?

Thank you for your patience and help.

Sebastien

Community
  • 1
  • 1
user1895293
  • 115
  • 1
  • 13

1 Answers1

3

Without having the features you plan on implementing it is hard to say. Assuming that your system will:

  1. allow uploading of documents
  2. allow searching of documents based on various metadata
  3. allow downloadig

consider either:

  1. Keeping the files in a filesystem but the metadata in a database such as mysql
  2. Keeping the files in a filesystem but use a search engine like Elasticsearch to store the metadata.

Both of the solutions would work depending on how you want to search. The flow of your application would be:

Uploading Documents

  1. User uploads new document
  2. Assign document internal ID
  3. Store document in filesystem based on that ID
  4. Store metadata in database/elasticsearch using ID to reference the file

Retrieving Documents

  1. User enters search criteria
  2. You generate query for either database or elasticsearch
  3. Display results. The result will have a link with the internal ID you created
  4. User selects result. You use the ID to get the document
Dave
  • 13,518
  • 7
  • 42
  • 51
  • First of all, thanks for the answer. The features will be very basic to begin with, the user can create/manage/delete a "library", the user can import local documents into a given library, the user can "mark" documents as important, to read, to print, etc and finally the user must be able to search a library to find a certain document based on the metadata and content of the document. I have already considered both your solutions but these separate the storage/queries , my real question is : "Is there ONE solution that enables BOTH storage AND querying?" – user1895293 Nov 24 '13 at 15:44
  • 1
    @user1895293: No, there is not one solution. Commercial DMS systems store documents as images and metadata as relational data in relational databases. Some DMS systems use optical character recognition (OCR) to extract some data from the images. – Gilbert Le Blanc Nov 24 '13 at 16:16
  • I went for JCR and Apache Jackrabbit in the end. – user1895293 Dec 04 '13 at 15:55