I have a project I would like to work on which I feel is a beautiful case for Neo4j. But there are aspects about implementing this that I do not understand enough to succinctly list my questions. So instead, I'll let the scenario speak for itself:
Scenario: In simplicity, I want to build an application that will allow Users who will receive files of various types such as docs, excel, word, images, audio clips and even videos - although not so much videos, and allow them to upload and categorize these.
With each file they will enter in any and all associations. Examples:
- If Joe authors a PDF, Joe is associated with the PDF.
- If a DOC says that Sally is Mary's mother, Sally is associated with Mary.
- If Bill sent an email to Jane, Bill is associated with Jane (and the email).
- If company X sends an invoice (Excel grid) to company Y, X is associated with Y.
and so on...
So the basic goal at this point would be to:
- Have users load in files as they receive them.
- Enter the associations that each file contains.
- Review associations holistically, in order to predict or take some action.
- Generate a report of the interested associations including the files that the associations are based on.
The value for this project is in the associations, which in reality would grow much more complex then the above examples and should produce interesting conclusions. However. if the User is asked "How did you come to that conclusion", they need to be able to produce a summary of the associations as well as any files that these associations are based on - ie the PDF or EXCEL or whatever.
Initial thoughts...
I also should also add that this applicatoin would be hosted internally, and probably used by approx 50 Users so I probably don't need super-duper, fastest, scalable, high availability possible solution. The data being loaded could get rather large though, maybe up to a terabyte in a year? (Not the associations but the actual files)
Wouldn't it be great if Neo4J just did all of this! Obviously it should handle the graph aspects of this very nicely, but I figure that the file storage and text search is going to need another player added to the mix.
Some combinations of solutions I know of would be:
Store EVERYTHING including files as binary in Neo4J.
Would be wrestling Neo4J for something its not built for. How would I search text?
Store only associations and meta data in Neo4J and uploaded file on File system.
How would I do text searches on files that are stored on file server?
Store only associations and meta data in Neo4J and uploaded file in Postgres.
Not so confident of having all my files inside DB. Feel more comfortable having all my files accessible in folders.
Everyone says its great to put your files in DB. Everyone says its not great to put your files in DB.
Get to the bloody questions..
- Can anyone suggest a good "stack" that would suit the above?
Please give a basic outline on how you would implement your suggestion, ie:
- Have the application store the data into Neo4J, then use triggers to update Postgres.
- Or have the files loaded into Postgres and triggers update Neo4J.
- Or Have the application load data to Nea4J and then application loads data into Postgres.
- etc
How you would tie these together is probably what I am really trying to grasp.
Thank you very much for any input on this.
Cheers.
p.s. What a ramble! If you feel the need to edit my question or title to simplify, go for it! :)