1

I administer a University's document management system. The system is a 3rd party that integrates with another 3rd party database that acts as our ERP system. The DMS is quite clunky and has a wide array of terrible bugs / lacks features & support. I've been playing around with Google App Engine / Drive SDK in my free time out of curiosity. Since we are a Google Apps for Education customer, we have unlimited drive space and all our users are Google apps users.

Would it be feasible to internally build a web application (potentially powered by Google App Engine) that utilizes the Drive SDK to manage all the university's files (~ 6 TB). From my experimenting it seems to have all the capabilities required.

  • Size of the data won't be important, it should be able to handle everything (without knowing more of what you plan to do). – Ryan Jan 05 '15 at 20:11
  • The extent of the system is primarily > import document into DMS > index document based on predefined index fields for that specific doc type > query database with index values to retrieve document list. – Kyle McIntire Jan 05 '15 at 20:18
  • Between the tools you mentioned already, the datastore and full text search you should be able to do what you need. https://cloud.google.com/appengine/training/fts_intro/lesson2 – Ryan Jan 05 '15 at 21:13

1 Answers1

0

Since you'll be building your own software, the answer to "will it do what I want" is always "yes, eventually".

You'll need to make a decision about document formats, which in turn will influence your indexing mechanism. Specifically, you have two primary options:-

  1. convert the files to Google document formats (doc, spreadsheet, etc). You will then be able to use Google's own indexing and search, eg. as you would from drive.gogle.com. The downside is that formatting may be lost during the import/export round trip.

  2. keep the documents in their native format (eg. MS .docx), and perform your own indexing. This will require parsing each document type, which is non-trivial, but I'm sure there are third party libraries to assist. The upside is that the documents you retrieve are the identical documents you imported.

I think I would look at doing both of the above. Thus when you import a file into your DMS you store it twice into Google Drive, converted and unconverted. Use App Engine datastore to keep track of the pairings. This way you can use the Drive search to find the converted document, but the file you serve back to the user is its unconverted twin.

pinoyyid
  • 21,499
  • 14
  • 64
  • 115
  • Thanks pinoyyid! I was pretty sure it has all the capabilities i needed and then some - just wanted to bounce it off people who had some experience. Also, that is an excellent idea to keep both documents. I was already concerned about maintaining formatting. Thanks! – Kyle McIntire Jan 06 '15 at 12:10
  • Part of this answer is incorrect. Google Drive *does* index MS Office files, there is no need to convert. Now that OCM is integrated in the Docs Suite you can even editing the MS files without having to convert to our native format (although, you might want to so you can use some of the fancy Docs features). @KyleMcIntire – Dan McGrath Jan 09 '15 at 17:25
  • @DanMcGrath many thanks for adding that. Could you provide a link where these features are detailed, eg. which Office file formats are indexed? – pinoyyid Jan 10 '15 at 03:56