2

I came across one more issue while resolving the previous problem: So, I have this data: Data

For each route -> I want to get only those rows where ob exists in rb. Hence, this output: Output

I know this also needs to worked through a temp table. Earlier I was doing this as suggested by @smb:

select * from table_name as a
inner join
(select load, rb from table_name
group by load, rb) as b
on a.load = b.load
and 
a.ob = b.rb

but this solution will give me: Wrong Output

And this is incorrect as it doesn’t take into account the route.

It’d be great if you guys could help :)

Thanks

Gaurav Bansal
  • 5,221
  • 14
  • 45
  • 91

2 Answers2

2

updated to add in route -

The answer would be in a nested join. The concept is

  1. Get a list of distinct pairs of obs and rbs
  2. Join to the original data where ob = ob and lane = rb

Code as follows:

select * from table_name as a
inner join
(select route, ob, rb from table_name
group by route, ob, rb) as b
on a.ob = b.ob
and 
a.lane = b.rb
and 
a.route = b.route

I have done an example using a temp table here so you can see it in action.

Note that if your data is large you should consider making sure your dist key in the join. This makes sure that redshift knows that no rows need to be joined across different compute nodes so it can execute multiple local joins and therefore be more efficient.

smb
  • 653
  • 4
  • 13
1

few ways (in statement is simple but often slower on larger sets)

select * 
from table
where lane in (select rb from table)

or (i find exists faster on larger sets, but try both )

select * 
from table
where exists (select 'x' from table t_inner 
              where t_inner.rb = table.lane)

either way create an index on the rb column for speed

Ab Bennett
  • 1,391
  • 17
  • 24
  • this is redshift - no indexes. – Jon Scott Nov 09 '17 at 07:24
  • true, maybe a sort key then – Ab Bennett Nov 09 '17 at 07:26
  • Your solution isn't sensitive to matching the rb relative to the ob. i.e. the rb could exist against another ob but not on the ob you are matching it against and therefore giving incorrect results. So you should add in where ob = ob to both solutions. – smb Nov 09 '17 at 23:33
  • depends how you interpret the question. I didn't think this is what he wanted. "If lane doesn’t exist in rb" does not say "If lane doesn’t exist in rb FOR THAT OB" – Ab Bennett Nov 09 '17 at 23:53