6

I want to lookup values listed in a temp table:

So let us say:

Create Table #mylist
(
eserial nvarchar(35) Collate SQL_Latin1_General_CP850_CI_AS,
refdate datetime
)

Insert Into #mylist (eserial, refdate) Values ('A', '2015-09-15')
Insert Into #mylist (eserial, refdate) Values ('B', '2015-09-14')
Insert Into #mylist (eserial, refdate) Values ('C', '2015-09-13')
Insert Into #mylist (eserial, refdate) Values ('D', '2015-09-12')

I need the result to be the Top 1 date less than the reference date.
And should be returned in the same sequence as is in the temporary table.

What I tried:

Select
    lst.eserial,
    lst.refdate,
    app.CREATEDDATETIME,
From #mylist lst
Outer Apply 
    (Select Top 1 rec.CREATEDDATETIME, rec.ESERIAL, rec.ITEMID
     From TableSource rec
     Where lst.eserial=rec.ESERIAL And rec.CREATEDDATETIME<lst.refdate
     Order By rec.CREATEDDATETIME Desc
    ) As app

This works but it is slow. Also, if the number of rows are increased, it does not consistently preserve the sequence of eserial. I need the query to preserve the order I put it in the temporary table.

Again my expected output is simply:

enter image description here

Where eserial is the same sequence as the temp table and CREATEDDATETIME is the maximum date less than the reference date. More like a conditional Vlookup if you know Excel.

L42
  • 19,427
  • 11
  • 44
  • 68
  • Honestly, I would've expected the OUTER APPLY method would work the best in a general case. Are you not creating any indexes on your temp table? Can you specify `eserial` as a `PRIMARY KEY` in your create? – Bacon Bits Sep 04 '15 at 03:14
  • @BaconBits I tried specifying it as Primary Key but the same result. The second code is faster almost 10x. – L42 Sep 04 '15 at 06:26
  • 1
    Have you tried with an index on ESERIAL, CREATEDDATETIME desc include(ITEMID) where AREAID='home' – Julien Vavasseur Sep 04 '15 at 07:17
  • I'd recommend you to edit the question and leave only relevant bits in your question. I understand you've been rewriting your query and the question evolved. Leave only the latest variant, which is temp table having two columns, query has `outer apply` and doesn't use `AREAID='home'`. (This is how I understood it, I may be wrong, that's why I ask you to clean it up.) The way it is now is rather confusing. – Vladimir Baranov Sep 15 '15 at 03:21
  • @VladimirBaranov Edited. – L42 Sep 15 '15 at 06:20
  • 1
    @L42, your problem would become much clearer if in your sample data you showed that you insert `B`, then `A`, then `C`, then `D` and then show expected result again in this order (`B, A, C, D`). With this example it would be clear that you want to preserve the order (sequence in your terms) of rows as they were inserted into the table, **not** the order of values in the `eserial` column. – Vladimir Baranov Sep 15 '15 at 06:46
  • @VladimirBaranov Ok got it. I didn't occur to me that I have to implicitly state it that way. – L42 Sep 15 '15 at 07:01

3 Answers3

6

It is not quite clear what you mean by

maintain the sequence of the items in the temporary table

, but if you want to get result ordered by eserial, then you have to add ORDER BY eserial to your query. Without ORDER BY the resulting rows can be returned in any order. This applies to any method that you choose.

So, taking your last query as a basis, it will look like this:

Select
    lst.eserial
    ,lst.refdate
    ,app.CREATEDDATETIME
From
    #mylist lst
    Outer Apply
    (
        Select Top 1 rec.CREATEDDATETIME
        From TableSource rec
        Where lst.eserial=rec.ESERIAL And rec.CREATEDDATETIME<lst.refdate
        Order By rec.CREATEDDATETIME Desc
    ) As app
ORDER BY lst.eserial;

To make it work fast and efficiently add an index to TableSource on (ESERIAL, CREATEDDATETIME). Order of columns in the index is important.

It is also important to know if there are any other columns that you use in OUTER APPLY query and how you use them. You mentioned column AREAID in the first variant in the question, but not in the last variant. If you do have more columns, then clearly show how you intend to use them, because the correct index would depend on it. The index on (ESERIAL, CREATEDDATETIME) is enough for the query I wrote above, but if you have more columns a different index may be required.

It would also help optimizer if you defined your temp table with a PRIMARY KEY:

Create Table #mylist
(
    eserial nvarchar(35) Collate SQL_Latin1_General_CP850_CI_AS PRIMARY KEY,
    refdate datetime
)

Primary key would create a unique clustered index.

One more important note. What is the type and collation of columns ESERIAL and CREATEDDATETIME in the main TableSource table? Make sure that types and collation of columns in your temp table matches the main TableSource table. If the type is different (varchar vs. nvarchar or datetime vs. date) or collation is different index may not be used => it will be slow.

Edit

You use the phrase "same sequence as the temp table" several times in the question, but it is not really clear what you mean by it. Your sample data doesn't help to resolve the ambiguity. The column name eserial also adds to the confusion. I can see two possible meanings:

  1. Return rows from temp table ordered by values in eserial column.
  2. Return rows from temp table in the same order as they were inserted.

My original answer implies (1): it returns rows from temp table ordered by values in eserial column.

If you want to preserve the order of rows as they were inserted into the table, you need to explicitly remember this order somehow. The easiest method is to add an IDENTITY column to the temp table and later order by this column. Like this:

Create Table #mylist
(
    ID int IDENTITY PRIMARY KEY,
    eserial nvarchar(35) Collate SQL_Latin1_General_CP850_CI_AS,
    refdate datetime
)

And in the final query use ORDER BY lst.ID.

Vladimir Baranov
  • 31,799
  • 5
  • 53
  • 90
  • `Without ORDER BY the resulting rows can be returned in any order. This applies to any method that you choose` So does this mean that what I want cannot be achieved? How I put it in the temp table is the order I want it returned. – L42 Sep 15 '15 at 06:11
  • This is not a true technical problem, it's a conceptual problem. i.e. this isn't a problem for 99% of SQL Server developers. It seems you already solved your problem by adding a field to the table to indicate order, and that is what you must do. – Nick.Mc Sep 15 '15 at 06:15
  • @Nick.McDermaid So that is the only way. Like what Eric posted. Adding another field which will uniquely preserve the sequence. – L42 Sep 15 '15 at 06:17
  • 1
    Yes. There is no 'logical' sequence till you add a field which contains that information. – Nick.Mc Sep 15 '15 at 06:34
  • 1
    A _clustered index_ ensures that data is _logically_ ordered in that way but that's purely for performance reasons and is still no guarantee that data will come back in that order without an `ORDER BY` – Nick.Mc Sep 15 '15 at 06:35
  • 1
    @L42, I updated the answer. One of the fundamental principles in relational databases is that tables are unordered sets of rows. If you need to get result in a particular order, use `ORDER BY`. There is no other way. Indexes may help to run the query faster, they may help if sorting is needed, but final order of the result is guaranteed only if you use `ORDER BY`. – Vladimir Baranov Sep 15 '15 at 06:42
  • So yeah, @Nick.McDermaid is right that only way to get it done is how I did it in the first post. Ok that settles it. No way around it then. And simplest answer would be to add identity to it. – L42 Sep 15 '15 at 06:59
  • @L42, you effectively have two questions. One relates to order of rows and getting correct results. It seems that using `OUTER APPLY` together with `IDENTITY` column and `ORDER BY ID` solves it. Second question is how to make the query run fast. This is where appropriate index(es) are required. The index on `(ESERIAL, CREATEDDATETIME)` that I suggested should be enough. – Vladimir Baranov Sep 15 '15 at 07:04
  • Why not include `ID` on the index? – L42 Sep 15 '15 at 07:10
  • 1
    Index should be on `TableSource` table. – Vladimir Baranov Sep 15 '15 at 07:11
3

That's easy using identity. Query without Order is not guarantee to have order in SQL server.

Create Table #mylist
(
    seqId int identity(1,1),
    eserial nvarchar(35) Collate SQL_Latin1_General_CP850_CI_AS,
    refdate datetime
)

Use the table freely and put Order By seqId at the end of your query

Edit

Use MAX() instead of TOP 1 with order if you have no cluster index on ESERIAL, CREATEDDATETIME on the TableSource

https://stackoverflow.com/a/21420643/1287352

Select
    lst.eserial,
    lst.refdate,
    app.CREATEDDATETIME,
From #mylist lst
Outer Apply 
    (
        Select MAX(rec.CREATEDDATETIME), rec.ESERIAL, rec.ITEMID
        From TableSource rec
        Where lst.eserial = rec.ESERIAL And rec.CREATEDDATETIME < lst.refdate
        GROUP BY rec.ESERIAL, rec.ITEMID
    ) As app
ORDER BY lst.seqId 
Community
  • 1
  • 1
Eric
  • 5,675
  • 16
  • 24
  • This is similar to the approach I used before I edit the question which was requested by @VladimirBaranov, except you use `Identity`. But same logic. – L42 Sep 15 '15 at 06:19
1

Perhaps the performance issue is due to indexing. Try adding the indexes below, removing UNIQUE if the keys are not unique.

CREATE UNIQUE NONCLUSTERED INDEX idx ON #mylist (eserial, refdate);
CREATE UNIQUE NONCLUSTERED INDEX idx ON TableSource (eserial, CREATEDDATETIME);
Dan Guzman
  • 43,250
  • 3
  • 46
  • 71