19

I am using VSTS 2008 + C# + .Net 3.5 + SQL Server 2008 + ADO.Net. If I load a table from a database by using a DataTable of ADO.Net, and in the database table, I defined a couple of indexes on the table. My question is, whether on the ADO.Net DataTable, there is related index (the same as the indexes I created on physical database table) to improve certain operation performance on DataTable?

thanks in advance, George

John Saunders
  • 160,644
  • 26
  • 247
  • 397
George2
  • 44,761
  • 110
  • 317
  • 455
  • 2
    [Note that DataViews *do* have indexes](https://msdn.microsoft.com/en-us/library/bb669089(v=vs.110).aspx). "The index for a DataView is built both when the DataView is created and when any of the sorting or filtering information is modified." – Bacon Bits Jul 26 '16 at 15:04

11 Answers11

19

Actually George's question is not so "bad" as some people insist it is. (I am more and more convinced that there's no such thing as, "a bad question").

I have a rather big table which I load into the memory, in a DataTable object. A lot of processing is done on lines from this table, a lot of times, on various (and different) subsets which I can easily describe as "WHERE ..." of SELECT clauses. Now with this DataTable I can run Select() - a method of DataTable class - but it is quite inefficient.

In the end, I decided to load the DataTable sorted by specific columns and implemented my own quick search, instead of using the Select() function. It proved to be much faster, but of course it works only on those sorted columns. The trouble would have been avoided, had a DataTable had indexes.

EvilDr
  • 8,943
  • 14
  • 73
  • 133
Negative
  • 191
  • 1
  • 2
  • But there are very few developers who have the problem you describe. It would be a waste of time for Microsoft to implement "indexes" in the DataTable class just for you. – John Saunders Nov 16 '16 at 04:06
  • 6
    @JohnSaunders Loading a DataTable into memory to be used for a large number of lookups doesn't seem like something "very few developers" would need to do. Given the range of database-driven applications, it seems like a fairly common thing to do. That's why Microsoft implemented [DataViews which do create indexes](https://msdn.microsoft.com/en-us/library/bb669089(v=vs.110).aspx). – Bacon Bits Dec 01 '16 at 14:06
  • @bacon it doesn't matter whether you think that "very few developers" need to do it. It's facts that matter. And database driven applications use databases, not in memory DataTable objects. Also, note that the OP was simply confused as a newbie and was mistaken about the role of a DataTable. – John Saunders Dec 04 '16 at 23:45
  • @JohnSaunders So, if you have to do a million lookups on table that's expected to have, say, 10,000 rows, you suggest executing a million individual queries is the professional way to do it? Whatever, man. – Bacon Bits Dec 07 '16 at 01:40
  • @bacon That's what a database is for. But 10,000 rows is nothing. How ahead and load it into a DataTable. But there is no index in the sense that a database table may have indexes. If the lookup is fast enough, use the DataTable. Otherwise, consider using a Dictionary or a custom collection object. – John Saunders Dec 07 '16 at 01:56
12

No, but possibly yes.

You can set up your own indices on a DataTable, using a DataView. As you change the table, the DataView will be rebuilt, so the index should always be up to date.

I did some bench tests for my own app. I use a DataTable to approximate a Boost MultiIndexContainer. To create an index on a column call "Author", I initialise the DataTable, and then the DataView...

_dvChangesByAuthor = 
    new DataView(
        _dtChanges, 
        string.Empty, 
        "Author ASC", 
        DataViewRowState.CurrentRows);

To then pull data by Author from the table, you use the view's FindRows function...

            dataRowViews = _dvChangesByAuthor.FindRows(author);
            List<DataRow> returnRows = new List<DataRow>();
            foreach (DataRowView drv in dataRowViews)
            {
                returnRows.Add(drv.Row);
            }

I made a random large DataTable, and ran queries using DataTable.Select(), Linq-To-DataSet (with forced execution by exporting to list) and the above DataView method. The DataView method won easily. Linq took 5000 ticks, Select took over 26000 ticks, DataView took 192 ticks...

LOC=20141121-14:46:32.863,UTC=20141121-14:46:32.863,DELTA=72718,THR=9,DEBUG,LOG=Program,volumeTest() - Running queries for author >TFYN_AUTHOR_047<
LOC=20141121-14:46:32.863,UTC=20141121-14:46:32.863,DELTA=72718,THR=9,DEBUG,LOG=RightsChangeTracker,GetChangesByAuthorUsingLinqToDataset() - Query elapsed time: 2 ms, 4934 ticks; Rows=65 
LOC=20141121-14:46:32.879,UTC=20141121-14:46:32.879,DELTA=72733,THR=9,DEBUG,LOG=RightsChangeTracker,GetChangesByAuthorUsingSelect() - Query elapsed time: 11 ms, 26575 ticks; Rows=65 
LOC=20141121-14:46:32.879,UTC=20141121-14:46:32.879,DELTA=72733,THR=9,DEBUG,LOG=RightsChangeTracker,GetChangesByAuthorUsingDataview() - Query elapsed time: 0 ms, 192 ticks; Rows=65

So, if you want indices on a DataTable, I would suggest DataView, if you can deal with the fact that the index is re-built when the data changes.

Steve Hibbert
  • 2,045
  • 4
  • 30
  • 49
8

You can create a primary key for the datatable. Filter operations get a big boost if you are searching in the primary key field. Check out this link: here

Spiff
  • 83
  • 2
  • 3
6

I had the same problem with many queries from a large datatable that are not according to the primary key.

The solution I found was to create DataView for each index I wanted to use, and then use it's Find and FindRows methods to extract the data.

DataView creates an internal index on the DataTable and behaves virtually as an index for this purpose.

In my case I was able to reduce 10,000 queries from 40 Seconds to ONE!!!

Noam
  • 4,472
  • 5
  • 30
  • 47
3

John above is correct. DataTables are disconnected in memory structures. They do not map to the physical implementation of the database.

The indexes on disk are used to speed up lookups because you don't have all the rows. If you have to load every row and scan them it is slow, so an index makes sense. In a DataTable you already have all the rows, so a comparison is fast already.

Jason Short
  • 5,205
  • 1
  • 28
  • 45
3

The correct answer here to the implicit question of creating an index on a DataTable is that you can't do that, but you can create one or more DataViews for the DataTable, which according to the doc will create an index based on the sorting the DataView specifies:

DataView constructs an index. An index contains keys built from one or more columns in the table or view. These keys are stored in a structure that enables the DataView to find the row or rows associated with the key values quickly and efficiently. Operations that use the index, such as filtering and sorting, see signifcant performance increases. The index for a DataView is built both when the DataView is created and when any of the sorting or filtering information is modified. Creating a DataView and then setting the sorting or filtering information later causes the index to be built at least twice: once when the DataView is created, and again when any of the sort or filter properties are modified.

If you need to do a large number of lookups to an in-memory DataTable, it may be the most straightforward and performant to use a DataView with the Find() or FindRows() method to do indexed key lookups. In particular, if you need to do a number of lookups and modifications to the data this would prevent needing to transform your DataTable into another indexed class like a Dictionary and then transforming it back into a DataTable again.

Bacon Bits
  • 30,782
  • 5
  • 59
  • 66
2

Others have made the point that a DataSet is not intended to serve as a database system--just a representation of data. If you are working under the impression that a DataSet is a database then you are mistaken and might need to reconsider your implementation.

If you need a client-side database, consider using SQL Compact or SQL Lite, both are free redistributable Database systems which can be used without requiring separate installations or services. If you need something more full-featured the SQL Express is the next step up.

To help clarify though, DataSets/Tables are used in .NET development to temporarily hold data as needed. Think of them as the results of a SELECT query against a database; they are roughly similar to CSV files or other forms of tabular data--you can pull data into them from a database, work with the data, and then push the changes back to a database--but they, on their own, are not databases.

If you have a large collection of items which you need to keep in memory for one reason or another then you might consider building a lightweight DTO (data transfer object, Google it, they're very simple) and loading them into a HashTable. HashTables won't give you any form of relational data, but are very efficient at look-ups.

Tom H
  • 46,766
  • 14
  • 87
  • 128
STW
  • 44,917
  • 17
  • 105
  • 161
  • Thanks John and Yoooder, I am thinking of why I am confused before. I think even if DataTable's initial values are retrieved from a SELECT from database normally, but we can issue select on the DataTable to get sub-set of data from DataTable, and it is why I am confused before and it is why I think when I issue SELECT on DataTable, maybe I need to create index to facilitate query performance, any comments? – George2 Jul 21 '09 at 04:33
1

My reading of the docs is that the correct way to achieve this (if needed) is to use AsDataView to produce a DataView (or LinqDataView) that's bound to the underlying table. If your DataTable is invariant then the DataView can be static to avoid redundant re-indexing.

I am currently investigating Linq to DataSet, and this q was helpful to me, so thanks.

Steve Townsend
  • 53,498
  • 9
  • 91
  • 140
1

DataTables have a PrimaryKey field that can serve as an index (they are fast already anyway). This field is not copied from the Primary Keys of the database (although that might be nice).

PRMan
  • 535
  • 1
  • 8
  • 16
1

DataTables are indexed if you (the coder) specify one or more DataColumns as the Primary Key. Interally ADO.NET uses a Red-Black tree to form this index giving log-time lookups. This Primary Key is not set automatically based on any underlying keying from the data provider.

Fixee
  • 1,581
  • 2
  • 15
  • 25
-1

George,

The answer is no.

Actually, some sort of indexing may be used internally, but only as an implementation detail. For instance, if you create a foreign key constraint, maybe that's assisted by an index. But it doesn't matter to a developer.

John Saunders
  • 160,644
  • 26
  • 247
  • 397
  • I am confused. 1. Can we use/create index in DataTable to improve performance? 2. If DataTable is not using index, how could it improve performance? – George2 Jul 10 '09 at 14:06
  • 1
    George, why in the world do you think that you _need_ to improve performance? – John Saunders Jul 10 '09 at 14:22
  • Hi John, I think DataTable is a (memory) mapping or equivalent of a physical database table, and since we need index on physical database table to improve performance, why we do not have index concept on DataTable? Does it mean index not improving performance for DataTable? – George2 Jul 10 '09 at 17:34
  • 1
    It means it's a bad question, George. You're assuming too much and wasting your time in the process. DataSet/DataTable are meant to be a disconnected, in-memory structure that matches the model of a relational database. That's the _model_, George, not the implementation, which is what an index is. So, no, don't worry about performance problems before they happen, or you'll find you've worried about the wrong problems. And don't worry about performance problems with DataTable. – John Saunders Jul 10 '09 at 17:59
  • 1
    BTW, is this you: http://social.msdn.microsoft.com/Forums/en-US/csharpgeneral/thread/f21f5149-a6c2-4980-b72f-be6821a5cfcb – John Saunders Jul 10 '09 at 18:06
  • Yes, that is mine. :-) Sorry for my stupid John, if there is no index concept in DataTable, what data access pattern should we use to access data in DataTable in order to improve performance? Data access pattern I mean like, we should order by index column in database table (or in other words, we should not order by a column which does not have index on it). If we have no index in DataTable, then for order by/group by/sort in DataTable, I am not sure the performance issue (like we met with if we order by a column with no index)? – George2 Jul 11 '09 at 05:38
  • 1
    George, my advice is to spend no more thought about performace issues with DataTable or anything else until you discover a performance issue. Certainly, I do not intend to spend any more thought on the subject. – John Saunders Jul 11 '09 at 11:34
  • Thanks John, final question, I want to confirm from developer point of view there is no concept of index on DataTable and for any data access pattern, we should leave it to DataTable itself to optimize? – George2 Jul 11 '09 at 12:44
  • 1
    George, optimize what? Do you have a performance problem? Also, you seem to think that DataTable is an in-memory database. It is not. There is no optimization. There is nothing to optimize. – John Saunders Jul 11 '09 at 13:27
  • John, I finally understand your points. I have marked your reply as answer. – George2 Jul 21 '09 at 04:29
  • I was searching the same question George asked and I think it's not a bad question at all. I want to know the implementation to know if I need to create a custom find. – Paul Exchange Jan 16 '12 at 14:05
  • If you want to know the implementation, then go read the code. It's not a bad question, it's just a bad assumption. – John Saunders Jan 17 '12 at 15:28