Better to filter in datatable, or in SQL query?

Question

I have a list of values of type string, I am currently looping over this and using it in the "WHERE" of many 1000's of SQL queries, I was wondering if I should run one query and do the filtering in the resulting datatable.

I have already implemented this using the multiple query method, and didn't notice any performance issues until the selection criteria grew into the 10's of thousands. I have tested with the StopWatch class and it looks like 95% of the time per loop is taken doing the SQL query, sum total of the table size on disk is < 50MB which would lead to a higher memory cost.

SQL is of the form:

SELECT col1, col2, col3 FROM table1 WHERE col4 = @criteria

Methods I have seen for the filtering of datatables come mostly from this post How I can filter a Datatable?

DataView dv = new DataView(dataTable);
dv.RowFilter = "query"; //query = String.Format("col4 = {0}", criteria)

DataTable dataTableFiltered = dataTable.AsEnumerable()
          .Where(row => row.Field<String>("col4") == criteria)
          .CopyToDataTable();

dataTable.select("query").CopyToDataTable()

Two part question really, should I convert to filtering in the datatable method, and if so which method would probably work fast enough?

Once the data is acquired, it is written to a text file structured like this:

@Criteria1
Values
@Criteria2
Values
Values
Values
@Criteria3
...

This is highly contextual; *usually* you want to fetch as little data as possible - i.e. filter at the database server - however there are times when you can sensibly load all/most/much of the data into memory and filter in-place in memory, reusing the same chunk of cached data for multiple queries; both are valid models. The main bit that isn't valid IMO: using `DataTable` :) — Marc Gravell, Jul 29 '19 at 16:02
Let me understand better. You are executing 1000 queries to the same database table changing just the @criteria parameter? Or are you querying different tables with different columns and with different where conditions? — Steve, Jul 29 '19 at 16:06
Along the lines of what @MarcGravell mentioned, this is dependent upon the situation. If you could provide more context on how you are using the data sets then a decent answer could probably be proposed. With the little bit of information that you provided however, any answer would likely be solely opinion based unless it covered a large portion of the possible scenarios and explained why each one is better or worse than the rest. — gmiley, Jul 29 '19 at 16:19
It depends on the amount of time it takes to transfer the data and the time to get data from the database. Usually doing processing in a SQL Stored procedure is faster than doing the processing in c#. Also doing the processing in c# requires transferring more data from the server to c# and the transfer time adds to the time. So in most cases it is better to perform the work in a SQL stored procedure than in c#. Even when you do not require a store procedure doing the filtering in a SQL statement would be faster. — jdweng, Jul 29 '19 at 16:47
@jdweng in some cases, say for data that doesn't change often, caching it client side and filtering there may be the better option instead of making many server calls. One longer load time, in that instance, may be the better option. That is why I think this question is too broad without the asker providing more context on what they are trying to accomplish. — gmiley, Jul 29 '19 at 16:52
It doesn't matter how often the data is used, it is more important on the size of the data. You do not want to transfer 1GB of data and then only use a few 1000 of those bytes. My answer was just a general answer. Yes in some cases it is better to do in c#, but the majority of cases it is better to do on server. The server is usually a PC with more memory than the client and the SQL server is a faster macine. — jdweng, Jul 29 '19 at 17:09
@Steve same table, different criteria. between the many queries I am getting most of the table. I'll also add a bit to the question to state what I am doing the the fetched data. — Richard Harris, Jul 30 '19 at 10:01
And now we go back to what @jdweng said. How much data are you transfering from the server to the client? This is the critical info to consider. If you have a small data set and this data doesn't change often, then it is better to transfer everything on the client than to do 1000 queries to a busy server. Now, if you don't have a busy server, or the data to look at is very big then..... Another important aspect is how do you query for these strings. I hope you don't use AddWithValue to pass the string parameter — Steve, Jul 30 '19 at 10:06
@MarcGravell I am using a `SqlClient.SqlDataAdapter` to get the data into a DataTable, so I left it in there to write into a file. Could you suggest what you might use instead? — Richard Harris, Jul 30 '19 at 10:08
@Steve data transferred is the same as that on the disk `sum total of the table size on disk is < 50MB` and parameter added with `adapter.SelectCommand.Parameters.Add("@Value", SqlDbType.NVarChar).Value = whereValue;` — Richard Harris, Jul 30 '19 at 10:17
@RichardHarris `List` for any `class T` that looks like your data, and your choice of ORM / micro-ORM — Marc Gravell, Jul 30 '19 at 10:19
@MarcGravell Someone else is maintinaing the views this is looking at and as such they are not quite static (added/removed columns, altered datatypes), can a class be made to take that? DataTable picks it up at runtime so has been handy for dealing with this. — Richard Harris, Jul 30 '19 at 10:32
@RichardHarris that is perhaps the one *legitimate* remaining use of `DataTable` - most code doesn't do that :) — Marc Gravell, Jul 30 '19 at 10:35

Better to filter in datatable, or in SQL query?

0 Answers0