2

My application has to export the result of a stored procedure to .csv format. Basically, the client performs a query, and he can see the results on a paged grid, if it contains what he wants, then he clicks on a "Export to CSV" button and he downloads the whole thing.

The server will have to run a stored procedure that will return the full result without paging, create the file and return it to the user.

The result file could be very large, so I'm wondering what is the best way to create this file in the server on demand and serve it to the client without blow up the server memory or resources.

The easiest way: Call the stored procedure with LINQ, create a stream and iterate over the result collection and creating a line in the file per collection item.

  1. Problem 1: Does the deferred execution applies as well to LINQ to stored procedures? (I mean, will .NET try to create a collection with all the items in the result set in memory? or will it give me the result item by item if I do an iteration instead of a .ToArray?)

  2. Problem 2: Is that stream kept in RAM memory till I perform a .Dispose/.Close?

The not-so-easy way: Call the stored procedure with a IDataReader and per each line, write directly to the HTTP response stream. It looks like a good approach, as long as I read I write to the response, the memory is not blown up.

Is it really worth it?

I hope I have explained myself correctly.

Thanks in advance.

Dan Atkinson
  • 11,391
  • 14
  • 81
  • 114
vtortola
  • 34,709
  • 29
  • 161
  • 263
  • Is this an MVC3 app if yes I am assuming you are showing the data to the client on a webpage through a Model , if yes then why don't you retrieve that model back on you CSV download request. It might save you the SQL Server call and reduce the overhead of your server and improve the response time. I am assuming that the view being shown is not a paged view. – Nikshep Sep 10 '11 at 07:07

4 Answers4

4

Writing to a stream is the way to go, as it will rougly consume not more than the current "record" and associated memory. That stream can be a FileStream (if you create a file) or the ASP.NET stream (if you write directly to the web), or any other useful stream.

The advantage of creating a file (using the FileStream) is to be able to cache the data to serve the same request over and over. Depending on your need, this can be a real benefit. You must come up with an intelligent algorithm to determine the file path & name from the input. This will be the cache key. Once you have a file, you can use the TransmitFile api which leverages Windows kernel cache and in general very efficient. You can also play with HTTP client caches (headers like last-modified-since, etc.), so next time the client request the same information, you may return a not modified (HTTP 304 status code) response. The disadvantages of using cache files is you will need to manage these files, disk space, expiration, etc.

Now, Linq or IDataReader should not change much about perf or memory consumption provided you don't use Linq method that materialize the whole data (exhaust the stream) or a big part of it. That means you will need to avoid ToArray(), ToList() methods and other methods like this, and concentrate only on "streamed" methods (enumerations, skips, while, etc.).

Simon Mourier
  • 132,049
  • 21
  • 248
  • 298
1

I know I'm late to the game on here, but theoretically how many records are we talking here? I saw 5000 being thrown around, and if its around there that shouldn't be a problem for your server.

Answering the easiest way:

  1. It does unless you specify otherwise (you disable lazy loading).

  2. Not sure I get what you're asking here. Are you referring to a streamreader you'd be using for creating the file, or the datacontext you are using to call the SP? I believe the datacontext will clean up for you after you're done (always good practice to close anyway). Streamreader or the like will need a dispose method run to remove from memory.

That being said, when dealing with file exports I've had success in the past building the Table (CSV) programmatically (via iteration), then sending the structured data as an HTTP response with the type specified in the header, the not so easy way as you so eloquently stated :). Heres a question that asks how to do that with CSV:

Response Content type as CSV

Community
  • 1
  • 1
erito
  • 61
  • 1
  • 3
0

"The server will have to run a stored procedure that will return the full result without paging..."

Perhaps not, but I believe you'll need Silverlight...

You can set up a web service or controller that allows you to retrieve data "by the page" (much like just calling a 'paging' service using GridView or other repeater). You can make async calls from silverlight to get each "page" of data until completed, then use the SaveFileDialog to save to the harddisk.

Hope this helps.

Example 1 | Example 2

Community
  • 1
  • 1
gangelo
  • 3,034
  • 4
  • 29
  • 43
  • Silverlight is not an option, sorry. Thanks. – vtortola Aug 26 '11 at 17:49
  • 1
    'When you have a hammer, everything looks like a nail' ;) I think you're approach can be achieved on the server side as well by incrementally create a file on disk and serve it to the user when complete. – Robin van der Knaap Aug 26 '11 at 17:56
0

What you're talking about isn't really deferred execution, but limiting the results of a query. When you say objectCollection.Take(10), the SQL that is generated when you iterate the enumerable only takes the top 10 results of that query.

That being said, a stored procedure will return whatever results you are passing back, whether its 5 or 5000 rows of data. Performing a .Take() on the results won't limit what the database returns.

Because of this, my recommendation (if possible for your scenario), is to add paging parameters to your stored procedure (page number, page size). This way, you will only be returning the results you plan to consume. Then when you want the full list for your CSV, you can either pass a large page size, or have NULL values mean "Select all".

Keith
  • 5,311
  • 3
  • 34
  • 50
  • 1
    I think he is also concerned about the size of the CSV in memory in prep for download. – gangelo Aug 26 '11 at 17:44
  • Yep, that's understood. If he wants to page the results to avoid memory issues, he could use the paging parameters to generate the CSV as well. – Keith Aug 26 '11 at 17:48
  • That's the problem @Keith, that I cannot paginate because the file has to have the full result :) The paging is already implemented in the preview screen. What I'm concerned about is what gangelo has pointed out. – vtortola Aug 26 '11 at 17:48
  • @vtortola - I think what keith is saying is read the data in "pages" or "chunks" using the paging already incorporated by the stored proc, and stream the data to disk as csv file and serve that up. – gangelo Aug 26 '11 at 18:32
  • @vtortola - It depends on how you're creating the CSV. If you want to optimize for memory usage, adding paging parameters and looping through the pages of results (writing out line by line) will accomplish this for you. – Keith Aug 26 '11 at 19:31
  • Yes, but then I will need a lot of SQL queries to get the file. What if I write in the response stream at the same time I read the data from the SQL connection? – vtortola Aug 26 '11 at 20:39
  • I'm not 100% about the inner-workings of these things, but I'm pretty sure if your stored procedure is returning 5000 rows, all 5000 records are loaded into memory the moment you call the stored proc. And even if you re-wrote the query in L2Sql, you'd still be making multiple round-trips with the DB each iteration. I don't think there's a way around that. – Keith Aug 26 '11 at 21:31
  • Well apparently DataReader only load one row in memory at the time. Every time you do .Read(), it loads another row from the connection. http://stackoverflow.com/questions/1849144/sql-server-and-sqldatareader-trillion-records-memory and http://www.akadia.com/services/dotnet_data_reader.html – vtortola Aug 27 '11 at 22:39