2

We have a rather large SQL query, which is rather poorly performing. One of the problems (from analysing query plan) is the number of joins we have.

Essentially we have values in our data that we need to do a look up on another table.to get the value to display to the user. The problem is that we have do a join on the same table 4 times because there are 4 different columns that all need the same look up.

Hopefully this diagram might make it clearer

Raw_Event_data
event_id,   datetime_id,         lookup_1, lookup_2, lookup_3, lookup_4
1,          2013-01-01_12:00,    1,        5,        3,         9          
2,          2013-01-01_12:00,    121,      5,        8,         19
3,          2013-01-01_12:00,    11,       2,        3,         32
4,          2013-01-01_12:00,    15,       2,        1,         0

Lookup_table
lookup_id,      lookup_desc
1,              desc1
2,              desc2
3,              desc3
...

Our query then looks something like this

Select 
    raw.event_id, 
    raw.datetime_id,
    lookup1.lookup_desc,
    lookup2.lookup_desc,
    lookup3.lookup_desc,
    lookup4.lookup_desc,
FROM 
    Raw_Event_data raw, Lookup_table lookup1,Lookup_table lookup2,Lookup_table lookup3,Lookup_table lookup4
WHERE raw.event_id = 1 AND
raw.lookup_1 *= lookup1 AND
raw.lookup_2 *= lookup2.lookup_id AND
raw.lookup_3 *= lookup3.lookup_id AND
raw.lookup_4 *= lookup4.lookup_id

So I get as an output

 1, 2013-01-01_12:00, desc1, desc5, desc3, desc9

As I said the query works, but the joins are killing the performance. That is a simplistic example I give there, in reality there will be 12 joins like above and we won't be selecting a specific event, but rather a range of events.

The question is, is there a better way of doing those joins.

Dace
  • 489
  • 1
  • 7
  • 17
  • Never used Sybase, but in Oracle I would consider to put first the range of events into a `temporary table` and only then do the joins with the `lookup_table`. Might that be a solution for you? – Guillem Vicens Jul 31 '13 at 14:50
  • Yeah thats one option that is used in other parts of our system and we'll probably do something similar here. My question is more about, is there any way around the multiple joins on those tables, rather than asking for a way to improve the query with other tricks, if you understand me – Dace Jul 31 '13 at 15:13
  • Maybe creating a `join index`? Check this [link](http://infocenter.sybase.com/archive/index.jsp?topic=/com.sybase.infocenter.dc00170.1260/html/iqapg/iqapg335.htm) – Guillem Vicens Jul 31 '13 at 15:19
  • I know you want us to just write a more efficient query without getting bogged down in other details, and I'm willing, but I think it's important to cover the Sybase IQ fundamentals first. I would not recommend using a "join" index, as it's deprecated on newer versions of IQ. Do you have "HG" or "LF" indexes on all of the "lookup_n" columns of the raw event table? Is lookup_id the primary key of lookup_table, if not, does it at least have an HG index on it? IQ is extremely efficient at storage space, if they're not indexed I can't think of a good reason why you should not add them. – Hotel Aug 01 '13 at 02:21
  • and to clarify the reason I'm asking about the indexes...IQ is extremely efficient at doing joins like this when properly indexed, especially if those "lookup_n" columns have a relatively low cardinality. It doesn't jump out at me as likely to perform poorly, even with the 12 joins. – Hotel Aug 01 '13 at 02:34
  • I'll have to check with our ETL guys about adding index's. We load lots of data (~200 Millions rows per partition) and we use index's on some of the columns and if we were to add any more it would negatively impact on our loading time. As regards HG or LF, it would have to be LF. There are only about 200 unique values for each of the lookups so HG wouldn't be suitable. – Dace Aug 01 '13 at 08:15
  • Remember, IQ is column based not row based, so those those lookup_n columns aren't actually growing as you're inserting data, unless it's a new unique value. – Hotel Aug 02 '13 at 00:55
  • Did you have any success with my answer below? – Hotel Sep 11 '13 at 20:42
  • Hi Hotel, as I said above, I couldn't add new Indexes like that – Dace Sep 19 '13 at 13:46

2 Answers2

0

correlated subqueries might be the way to go:

   SELECT r.event_id
        , r.datetime_id
        , (select lookup1.lookup_desc from lookup_table lookup1 where lookup1.lookup_id = r.lookup_1) as desc_1
        , (select lookup2.lookup_desc from lookup_table lookup2 where lookup2.lookup_id = r.lookup_2) as desc_2
        , (select lookup3.lookup_desc from lookup_table lookup3 where lookup3.lookup_id = r.lookup_3) as desc_3
        , (select lookup4.lookup_desc from lookup_table lookup4 where lookup4.lookup_id = r.lookup_4) as desc_4
     FROM Raw_Event_data r
    WHERE r.event_id = 1
        ;
collapsar
  • 17,010
  • 4
  • 35
  • 61
0

My first attempt would be to handle the indexing myself, if I was refused by the DBA's.

declare @start_range bigint, @end_range bigint

select
    @start_range = 5
    ,@end_range = 500

create local temporary table raw_event_subset
( --going to assume some schema based on your comments...obviously you will change these to whatever the base schema is.
    event_id bigint
    ,datetime_id timestamp
    ,lookup_1 smallint
    ,lookup_2 smallint
    --etc
) on commit preserve rows

create HG index HG_temp_raw_event_subset_event_id on raw_event_subset (event_id)
create LF index LF_temp_raw_event_subset_lookup_1 on raw_event_subset (lookup_1)
create LF index LF_temp_raw_event_subset_lookup_2 on raw_event_subset (lookup_2)
--etc

insert into raw_event_subset
select
    event_id
    ,datetime_id
    ,lookup_1
    ,lookup_2
    --,etc
from raw_event_data
where event_id >= @start_range --event_id *must* have an HG index on it for this to be worthwhile.
and event_id <= @end_range

--then run your normal query, except replace raw_event_data with raw_event_subset
select
    event_id
    ,datetime_id
    ,l1.lookup_desc
    ,l2.lookup_desc
    --etc
from raw_event_subset r
left join lookup_table l1
    on l1.lookup_id = r.lookup_1
left join lookup_table l2
    on l2.lookup_id = r.lookup_2
    --etc

drop table raw_event_subset

hope this helps...

Hotel
  • 1,361
  • 11
  • 13