1

A website is serving continuously updated content (think stock exchange), is required to generate reports on-demand and files get downloaded by users. Users can customize the downloaded report based on lots of parameters.

What is the best practice in handling highly customized reports downloaded files as (.xls)? How to cache and improve performance ?

It might be good to mention that the data is stored in RavenDb and the reports are expected to handle 100K results sizes.

Jalal El-Shaer
  • 14,502
  • 8
  • 45
  • 51

1 Answers1

2

Here are some pointers:

  • Make sure you haven static indexes defined in RavenDB to match all possible reports. You don't want to use dynamically generated temp indexes for this.

  • Probably one or more parameters will drastically change the query, so you may have some conditional logic to choose which of several query to run. This is especially true for different groupings, as they'll require a different map-reduce index.

  • Choose whether you want to limit your result set using standard paging with Skip and Take operators, or whether you are going to stream unbounded result sets.

  • However you build the actual report, do it in memory. Do not try to write it to disk first. Managing file permissions, locks, and cleanup is not worth the hassle. Plus, you risk taking servers down if they run out of disk space.

  • Preferably you should build the response and stream it out to your user in a single step, as to not require large amounts of memory on the server. Make sure you understand the yield keyword in C#, and that you work with IEnumerable and IQueryable directly whenever possible. Don't try to use .ToList() or .ToArray(), which will put the whole result set into memory.

  • With regard to caching, you could consider using a front-end cache like Memcached, but I'm not sure if it will help you here or not. You probably want as accurate of data that's possible from your database. Introducing any sort of cache will require you understand how and when to reset that cache. Keep in mind that Raven has several caching layers built in already. Build your solution without cache first, and then add caching if you need it.

Matt Johnson-Pint
  • 230,703
  • 74
  • 448
  • 575
  • Hi Matt, I'm using streaming api with transformers. The performance I'm experiencing is about 35K doc in about 40 sec.. roughly about 1K/sec. Are these numbers normal ? – Jalal El-Shaer Sep 05 '13 at 06:27
  • Hard to say. There's a lot of things that can affect performance. If you need help in perf tuning, you might want to ask in the [RavenDB Google Group](http://groups.google.com/group/ravendb), or use [a support incident](http://ravendb.net/support). – Matt Johnson-Pint Sep 08 '13 at 20:21