Summary
I am looking for a way to show staff that if they put just a small percentage of the time and effort they spend e-mailing each other into writing documentation, they would end up with a great support tool. I want to do this by getting word-count statistics out of MS Exchange 2007.
Background
I am helping an organisation I work with get their teams better at documenting networks and systems that they manage. We have a simple, wiki-style documentation system that we have put in place with lots of thought, design, templates and structure and so far it is working quite well, and now it is time to bring some other IT teams on-board.
One of the main issues that staff in these new teams have about providing any sort of documentation is time. They are very busy and perceive that they don't have the time to work on this kind of documentation, even though it has shown to reduce workload and time-to-restore for incidents with the teams that are already using it.
I figured a powerful metaphor for how time probably isn't the issue, is to show the teams how much time and effort they put into e-mail content each day.
Within our e-mail archives are probably countless nuggets of gold about how systems work and how problems were solved, with valuable information that would help support teams when those systems go bad. If only they had been put into a searchable wiki for everyone to see, (using the structure and templates we have provided).
The problem
I need to be able to extract raw data about how many words are typed by individuals in each e-mail that they send, summarised as a total number per day. This is tricky, as each email thread will of course contain copies of the previous e-mails that we don't want to count.
Once we have the statistics, per user per day, we can then use active directory group memberships to build totals per day for various teams, which will also anonymise the data somewhat.
What I've tried so far
I've done Google searches till my fingers bled but I don't have much knowledge of Exchange 2007 (or Windows Server for that matter - I'm a UNIX/Cisco person). I'm not sure of where in the stack the best place to get this information is and I also don't know much about the format of the mailbox stores/databases on the mailbox server.
I figure that there might be something more useful at the next layer up, query tools or database browsers and the like. I'm looking for that guidance.