0

So my case is a closed email system.

The emails are HTML enabled.

What is needed:
Full text searching (there are over 1 million emails in database, but they are usually pre-filtered based on users active in recent time)
Archiving - How can I archive emails that are old (older than 1-2 years).

Which is a better way to save these emails? As files on the server, or inside the database table. Or is it a combination of the two (due to archiving)?

After the above question - what are the specific tools/plugins that I can use to make the job easier. I remember hearing about Solr a little bit, but I am not sure what are other options / possibilities.

StanM
  • 827
  • 4
  • 12
  • 33

1 Answers1

0

Solr would help you on the search side, but has nothing to do with archiving. Look at Solr DIH, there was a contrib module (I think) that reads IMAP sources.

Regarding archiving, that is a very large area... there are many questions you must answer:

  • you want to store mails as a whole, or decompose its parts so you can deduplicate also parts that are repeated in different mails.
  • I would lean towards storing on filesystem, but watch out for:
  • you need to devise a way to detect deduplication
  • store smartly over a tree of dirs so you dont slow each dir browsing.
  • compress when needed (not smaller ones or imcompressible ones)
Persimmonium
  • 15,593
  • 11
  • 47
  • 78
  • by deduplication do you mean if the emails contain Reply text? If I wanted to search archives -> Is that a bad idea? – StanM Dec 04 '12 at 21:42
  • I mean not storing the same content twice, imagine the same attachment sent to 20 different people... – Persimmonium Dec 04 '12 at 22:18
  • Ahh yeah the attachments are stored separately. Right now I am only thinking of ways to resolve a mounting amount of emails (and the inevitable bloating of the db with the html text – StanM Dec 04 '12 at 22:27