How do I perform an efficient left join in kdb?

Question

I want to run a conventional SQL-style left join in KDB+ / Q.

We get at least one row in the result for every row in the left table.
If there is more than one match in the right table, I get a row for each of those, not just for the first match

Test Data

  x:([];a:1 1 2 3; b:3 4 5 6)

  y:([]; a:1 2 2 4; c:7 8 9 10)

The best version I can think of goes like this:

This appends a left join which provides only the first match to an inner join which provides all matches and then removes the duplicates:

  distinct ej[`a; x; y] , x lj `a xkey y

Can anyone supply me with one that is faster and/or better in some other way? I'd really like to avoid the use of distinct, for example.

Ryan Hamilton · Answer 1 · 2016-09-24T20:24:27.673

1

q)`a xgroup y  // group the lookup table by keys
a| c
-| ---
1| ,7
2| 8 9
4| ,10
q)x lj `a xgroup y  // join all combinations
a b c
------------
1 3 ,7
1 4 ,7
2 5 8 9
3 6 `long$()
q)ungroup x lj `a xgroup y  // unroll using ungroup to produce a flat table
a b c
-----
1 3 7
1 4 7
2 5 8
2 5 9

We provide a free tutorial on kdb joins that demonstrates all of them here: http://www.timestored.com/kdb-guides/qsql-inner-left-joins

Since we want every row..based on @Connors solution

Nice solution Connor. I modified yours to shorten/simplify the code:

q)bungroup:{ungroup {$[0=count x; (),first x; x]}''[x]}
q)bungroup x lj `a xgroup y
a b   c
--------
1 3   7
1 4   7
2 5   8
2 5   9
3 6
4 100 10

FYI: Connors is quicker and uses less memory for this particular case.

edited Sep 24 '16 at 20:24

answered Sep 22 '16 at 07:40

Ryan Hamilton

2,601
16
17

Does it meet the first requirement (We get at least one row in the result for every row in the left table)? I guess the OP wants `3 6 0N` to be included it the result. – Igor Korkhov Sep 22 '16 at 10:50
2

Thanks for the help! Right, we need at least one row in the result for every row in the left table. The ungroup / xgroup trick always results in an inner join, even when a kdb "lj" operation is used. This is because the null rows produced by the left join on the grouped table contain zero length lists for the columns from the unmatched table. When these are ungrouped, they produce zero rows instead of a single row containing null entries. Would it make sense to have a version of ungroup that produces a single row with null entires when the list valued columns are of zero length? Tx! – JSLover Sep 22 '16 at 21:28
So, now I find my self doing something that I believe is probably really bad form. I'm avoiding using strings. I'm doing that because strings, as I am using them, break the approach proposed here. For example, run the solution again with a new value of x with a string column it the solution fails. x:([];a:1 1 2 3; b:3 4 5 6; z: ("foo"; "bar"; "baz"; "bang")) – JSLover Nov 14 '16 at 11:14

score 1 · Accepted Answer · edited Dec 02 '17 at 02:36

1

Building on @Ryan's answer.

k)nungroup:{$[#x:0!x;(,/){$[#t:+:x;t;enlist *:'[x];t]}'[x]]}  
q)nungroup:{$[count x:0!x;(,/){$[count t:flip x;t;enlist first'[x]]}'[x];x]}  

q)nungroup x lj ` \`a xgroup y

a b c 
----- 
1 3 7 
1 4 7 
2 5 8 
2 5 9 
3 6

edited Dec 02 '17 at 02:36

Thomas Smyth - Treliant

4,993
6
25
36

answered Sep 22 '16 at 23:53

Connor Gervin

935
5
16

Wow, thanks! Connor, I don't read k yet. **Can you explain what you did?** Thanks again! – JSLover Sep 23 '16 at 02:33
This is the q implementation. `q)nungroup:{$[count x:0!x;,/[flip each x],first each ,/[x];x]}` – Connor Gervin Sep 23 '16 at 15:02
@Connor Gervin: your solution gives wrong results for e.g. `x:([]a:1 1 2 3 4; b:3 4 5 6 100)` – Igor Korkhov Sep 23 '16 at 15:02
good spot @Igor Korkhov, this should work. `q)nungroup:{$[count x:0!x;(,/){$[0=count t:flip x;enlist first'[x];t]}'[x];x]}` – Connor Gervin Sep 23 '16 at 15:16
**Seems to work.** When I get a chance, I may post benchmarks for this way to join. – JSLover Sep 23 '16 at 21:02
1

Nice solution, I like the use of first. Modified yours to simpler though slightly slower version above. – Ryan Hamilton Sep 24 '16 at 20:27

How do I perform an efficient left join in kdb?

2 Answers2