0

Considering Alfresco as an ECM, how one can manage bulk amount of data in it?Usually scanned(black and white) image with 150 dpi is of 274K, thousands of such scanned images would be uploaded each day. Reducing the size of file using compression mechanism such as ccitt G4 or JBIG2 would be helpful. I am new in this enterprise solution, please guide me how can I achieve this efficient handling of data to reduce the need of expensive hardware architecture. If it's not the responsibility of Alfresco then what other alternate approach should I adopt.

Community
  • 1
  • 1
Bilal Saeed
  • 116
  • 10
  • 1
    Last year in a benchmark, [Alfresco succeeded in loading 1 billion documents into a repo at a rate of 1000 documents per second!](https://www.alfresco.com/node/4141). So, shouldn't be an issue. What exactly is the problem you're facing? – Gagravarr Sep 05 '16 at 12:08
  • 1
    I am integrating Alfresco with my application where my clients upload scanned images with a rate of thousands files per day. Considering this ratio TBs of data would be uploaded in just a month means I need to have an expensive architecture (server and stuff) to deal with it. I am trying to compress the scanned images to make better use of ECM. – Bilal Saeed Sep 05 '16 at 12:26
  • 1
    Alfresco bundles ImageMagick, so as a quick'n'dirty solution you could just define a rule / add a behaviour on the repo that calls out to mogrify to compress or resize the images. Or, if you write your own custom frontend for loading, just add the imagemagick (or similar) step there – Gagravarr Sep 05 '16 at 13:00
  • @Gagravarr - ImageMagick can certainly do the job, but it will operate about 10x slower than necessary. A custom solution written with efficient code would be a better choice since the OP is trying to achieve efficient handling of his images. Alfresco's benchmark isn't relevant because it's about using cloud computing to insert documents into a database. The OP is concerned about local processing and the size of data being uploaded. – BitBank Oct 23 '16 at 14:06
  • @BitBank As long as the server can process all of a day's documents in under a day, a solution that can be coded in under a hour is likely to win out over something taking weeks, just considering the programmer time spent.... – Gagravarr Oct 23 '16 at 16:26
  • @Gagravarr - I never said anything about weeks. I have a solution that I created and which could be set up in minutes. It's 10x faster than ImageMagick. It's not free, but it won't occupy anyone's time to get running. – BitBank Oct 23 '16 at 22:51

1 Answers1

0

I hope you have found a solution in the meantime. When not and for the others who have the same question, Axel Faust further developed on his alfresco-simple-content-stores github project a compressingstore. You can install/integrate this project in your own and configure for which type of mimetypes of files you want to compress and how. This is a low level solution as it is totally transparent for the user and the compression is done when storing the document on the disk and decompressing when reading it to the user.

Hope it helps

sgirardin
  • 428
  • 5
  • 12