4

I want to run a conventional SQL-style left join in KDB+ / Q.

  • We get at least one row in the result for every row in the left table.
  • If there is more than one match in the right table, I get a row for each of those, not just for the first match

Test Data

  x:([];a:1 1 2 3; b:3 4 5 6)

  y:([]; a:1 2 2 4; c:7 8 9 10)

The best version I can think of goes like this:

This appends a left join which provides only the first match to an inner join which provides all matches and then removes the duplicates:

  distinct ej[`a; x; y] , x lj `a xkey y

Can anyone supply me with one that is faster and/or better in some other way? I'd really like to avoid the use of distinct, for example.

Thomas Smyth - Treliant
  • 4,993
  • 6
  • 25
  • 36
JSLover
  • 165
  • 1
  • 10

2 Answers2

1
q)`a xgroup y  // group the lookup table by keys
a| c
-| ---
1| ,7
2| 8 9
4| ,10
q)x lj `a xgroup y  // join all combinations
a b c
------------
1 3 ,7
1 4 ,7
2 5 8 9
3 6 `long$()
q)ungroup x lj `a xgroup y  // unroll using ungroup to produce a flat table
a b c
-----
1 3 7
1 4 7
2 5 8
2 5 9

We provide a free tutorial on kdb joins that demonstrates all of them here: http://www.timestored.com/kdb-guides/qsql-inner-left-joins

Since we want every row..based on @Connors solution

Nice solution Connor. I modified yours to shorten/simplify the code:

q)bungroup:{ungroup {$[0=count x; (),first x; x]}''[x]}
q)bungroup x lj `a xgroup y
a b   c
--------
1 3   7
1 4   7
2 5   8
2 5   9
3 6
4 100 10

FYI: Connors is quicker and uses less memory for this particular case.

Ryan Hamilton
  • 2,601
  • 16
  • 17
  • Does it meet the first requirement (We get at least one row in the result for every row in the left table)? I guess the OP wants `3 6 0N` to be included it the result. – Igor Korkhov Sep 22 '16 at 10:50
  • 2
    Thanks for the help! Right, we need at least one row in the result for every row in the left table. The ungroup / xgroup trick always results in an inner join, even when a kdb "lj" operation is used. This is because the null rows produced by the left join on the grouped table contain zero length lists for the columns from the unmatched table. When these are ungrouped, they produce zero rows instead of a single row containing null entries. Would it make sense to have a version of ungroup that produces a single row with null entires when the list valued columns are of zero length? Tx! – JSLover Sep 22 '16 at 21:28
  • So, now I find my self doing something that I believe is probably really bad form. I'm avoiding using strings. I'm doing that because strings, as I am using them, break the approach proposed here. For example, run the solution again with a new value of x with a string column it the solution fails. x:([];a:1 1 2 3; b:3 4 5 6; z: ("foo"; "bar"; "baz"; "bang")) – JSLover Nov 14 '16 at 11:14
1

Building on @Ryan's answer.

k)nungroup:{$[#x:0!x;(,/){$[#t:+:x;t;enlist *:'[x];t]}'[x]]}  
q)nungroup:{$[count x:0!x;(,/){$[count t:flip x;t;enlist first'[x]]}'[x];x]}  

q)nungroup x lj ` \`a xgroup y

a b c 
----- 
1 3 7 
1 4 7 
2 5 8 
2 5 9 
3 6
Thomas Smyth - Treliant
  • 4,993
  • 6
  • 25
  • 36
Connor Gervin
  • 935
  • 5
  • 16