0

I am using NETEZZA and I have two separate tables that I would like to join by the date. In the first table, the date is stored as type "date" (e.g., 2014-09-10) while in the second table, the date is stored as type "int4" (20140910). I've tried joining the tables by the date:

select * 
from table1 a inner join table2 b 
     on date(a.start_date) = to_date(b.start_date, 'YYYYMMDD')

This runs but is slow. It's been recommended to me that the comparison might be much faster if I could cast the date in table1 as int4 and then simply compare int4's. However, I couldn't find a way to do this or if this is even the best way.

Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
slaw
  • 6,591
  • 16
  • 56
  • 109
  • It is unlikely to run any faster if you do the conversion the other way around. Joins on keys of different types are just generally slow. – Gordon Linoff Sep 10 '14 at 13:50
  • @GordonLinoff: Can you at least give me an idea how I would perform the cast as I'm new to SQL. Then, I could test the time directly with a small case. One of my colleagues showed me (on a small number of rows) that if we substring the date in table1 to remove the hyphens (so that it matches the format of the int4) then it is almost an order of magnitude faster (seconds versus minutes). Of course, he wasn't certain that it would be true for a much larger number of rows. – slaw Sep 10 '14 at 13:57
  • Conversions force the database to do a full scan of the underlying data instead of using indexes. Convert *only* one of the fields to a type matching the other field. This way you can use at least one index. Preferably, convert the field on the table with fewer rows. The actual type doesn't really matter. – Panagiotis Kanavos Sep 10 '14 at 14:01
  • Also, dates have **no** format. Format appears only when you convert them to strings, explicitly or implicitly What your colleague actually did is (implicitly)convert the date to a string, strip the hyphens, (implicitly)convert it to int then compare it with the other index. Again, only one index was used but that was better than the full scan you had before. – Panagiotis Kanavos Sep 10 '14 at 14:04

2 Answers2

1

Here is your query:

select * 
from table1 a inner join
     table2 b 
     on date(a.start_date) = to_date(b.start_date, 'YYYYMMDD');

In general, databases have a really hard time with joins on different types of columns or on joins with functions. The reason is twofold: the function make it hard (or impossible) to use indexes. Statistics on columns of different types is incompatible.

However, if you move all the functions to one side, the engine might be able to do something. For instance:

select * 
from table1 a inner join
     table2 b 
     on b.start_date = cast(to_char(a.start_date, 'YYYYMMDD') as int);

This does all the conversion on one side. If you have an index on b.start_date, then then this can scan table1 and look up the corresponding values in the index. So, this version might run faster, but not as well as if the tables used the same types.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • This way of thinking made the efficiency of the query over an order of magnitude faster than mine (took about 10 seconds rather than several minutes!) – slaw Sep 10 '14 at 14:28
0

try withe same date type like this may be will work

SELECT *
FROM table1 a
INNER JOIN table2 b
ON (to_date(a.start_date,'YYYY-MM-DD') = to_date (b.start_date, 'YYYY-MM-DD'))
mugiwaradz
  • 393
  • 1
  • 3
  • 15
  • Maybe I don't understand but I don't see how this will make things faster as you are still comparing dates just like my original query is. From what I can tell, it is running equivalently as slow as my query. – slaw Sep 10 '14 at 14:02