2

I´m using aspnet-core, ef-core with sql server. I have an 'order' entity. As I'm expecting the orders table to be large and a the most frequent query would get the active orders only for certain customer (active orders are just a tiny fraction of the whole table) I like to optimize the speed of the query but I can decide from this two approaches:

1) I don't know if this is possible as I haven't done this before, but I was thinking about creating a Boolean column named 'IsActive' and make it an index thus when querying only Active orders would be faster.

2) When an order becomes not active, move the order to another table, i.e HistoricalOrders, thus keeping the orders table small.

Which of the two would have better results?, or none of this is a good solution and a third approach could be suggested?

Elmer Ortega
  • 469
  • 4
  • 12

2 Answers2

0

If you want to partition away cold data then a leading boolean index column is a valid way to do that. That column must be added to all indexes that you want to hot/cold partition. This includes the clustered index. This is quite awkward. The query optimizer requires that you add a dummy predicate where IsActive IN (0, 1) to make it able to still seek on such indexes. Of course, this will now also touch the cold data. So you probably need to know the concrete value of IsActive or try the 1 value first and be sure that it matches 99% of the time.

Depending on the schema this can be impractical. I have never seen a good case for this but I'm sure it exists.

A different way to do that is to use partitioning. Here, the query optimizer is used to probing multiple partitions anyway but again you don't want it to probe cold data. Even if it does not find anything this will pull pages into memory making the partitioning moot.

The historical table idea (e.g. HistoricalOrders) is the same thing in different clothes.

So in order to make this work you need:

  1. Modify all indexes that you are about (likely all), or partition, or create a history table.
  2. Have a way to almost never need to probe the cold partition.

I think (2) kills it for most cases.

Among the 3 solutions I'd probably pick the indexing solution because it is simplest. If you are worried about people making mistakes by writing bad queries all the time, I'd pick a separate table. That makes mistakes hard but makes the code quite awkward.

Note, that many indexes are already naturally partitioned. Indexes on the identity column or on an increasing datetime column are hot at the end and cold elsewhere. An index on (OrderStatus INT, CreateDateTime datetime2) would have one hot spot per order status and be cold otherwise. So those are already solved.

Some further discussion.

Community
  • 1
  • 1
usr
  • 168,620
  • 35
  • 240
  • 369
-2

Before think about the new table HistoricalOrders,Just create a column name IsActive and test it with your data.You don't need to make it as index column.B'cos Indexes eat up storage and it slows down writes and updates.So we must very careful when we create an index.When you query the data do it as shown below.On the below query where data selection (or filter) is done on the SQL srever side (IQueryable).So it is very fast.

Note : Use AsNoTracking() too.It will boost the perfromnace too.

var activeOrders =_context.Set<Orders>().Where(o => o.IsActive == true).AsNoTracking()
.ToList();

Reference : AsNoTracking()

Sampath
  • 63,341
  • 64
  • 307
  • 441
  • 1
    You still need an index on the `IsActive` column to avoid a table scan. – thoean Nov 08 '16 at 16:10
  • Indexes eat up storage and it slows down writes and updates.So we must very careful when we create an index @thoean – Sampath Nov 08 '16 at 16:37
  • 2
    @usr sampath helped me to decide between the two approaches mentioned in my question. But I would be interested to hear other solutions. Could you please suggest an alternative? – Elmer Ortega Nov 11 '16 at 04:03