0

I have two datasets, named data1 and data2. data1 look like:

 id1     id2  exc1  exc2  exc3  exc4
 "aa2"   "12ac"   45     54   53     65 
 "bb"     "23"   23     33   23     12

data2 looks like:

kid1   id2   sf1   sf2  sf3   sf4  exc1 exc2
"aa2" "ads2"  55    6    55   66    45   54

The kid1 column and id1 column have the same entries, just the number of rows in both dataset is different. There are some rows missing in data1 which I have to pick from data2. To do this I have to combine id1 and id2 in data1 and kid1 and id2 in data2 and create a new column called link. In excel I did this like "aa2 | 12ac" and similarly for data2. Further I have to search which entries of link are present in data2 but not in data1 and I have to add them in data1.

The condition for adding new rows in data1 from data2 is that: if columns name are same in data1 and data2, then use the data from data2 otherwise the column name which are not present in data2 but are present in data1, copy the data from row where kid1 = id1.

I have done the entire thing in excel and wanted to replicate the entire thing in kdb to make the process faster. It'd be great if someone can help me on this. Any leads on this is appreciated. Thanks

Thomas Smyth - Treliant
  • 4,993
  • 6
  • 25
  • 36
  • 1
    It is a lot more helpful when posting sample data to include a command to create the table in KDB, e.g. ``data1:flip `id1`id2`exc1`exc2`exc3`exc4!(("aa2";"bb");("12ac";"23");45 23i;54 33i;53 23i;65 12i); data2:data2:flip `kid1`id2`sf1`sf2`sf3`sf4`exc1`exc2!enlist each ("aa2";"ads2";55i;6i;55i;66i;45i;54i)`` – Jonathon McMurray Jan 22 '18 at 10:40

2 Answers2

3

Here's one method to do what I think you're looking for. First, setting up tables:

t:([]id1:("aa2";"bb");id2:("12ac";"23");exc1:45 23; exc2:54 33;exc3:53 23;exc4:65 12)
q:([]kid:enlist "aa2";id2:enlist "ads2";sf1:(),55;sf2:(),6;sf3:(),55;sf4:(),66;exc1:(),45;exc2:(),54)

Then, using the sv keyword to join the id1/id2 and kid/id2 columns together into the link column, as you do above, and key the tables on this new column

rt:`link xkey update link:`$"|"sv/:flip(id1;id2),id1:`$id1,id2:`$id2 from t
rq:`link xkey update link:`$"|"sv/:flip(kid;id2),kid:`$kid,id2:`$id2 from q

Then use uj to join the tables together, which will automatically overwrite values in the first table with values from the second table if the records match, or keep the old values if not:

q)rt uj rq
link     | id1   id2    exc1 exc2 exc3 exc4 kid   sf1 sf2 sf3 sf4
---------| ------------------------------------------------------
aa2|12ac | "aa2" "12ac" 45   54   53   65   ""
bb|23    | "bb"  "23"   23   33   23   12   ""
aa2|ads2 | ""    "ads2" 45   54             "aa2" 55  6   55  66

Hope that helps.

Ryan McCarron
  • 889
  • 4
  • 10
  • Thanks for this. I'm getting a `type error when I try to use uj. Also I need only the columns in rt and not those of rj. Is there a specific command for that? –  Jan 22 '18 at 11:12
  • Are both your tables keyed on the same column? If not this will throw an error, and a 'type will be thrown if both aren't keyed. If you want to just take the columns from rt, a quick way is to unkey and use take, `#`, like `cols[rt]#0!rt uj rq` – Ryan McCarron Jan 22 '18 at 11:16
  • One additional thing, if the record don't match, I needed the data in the new row which is inserted from rq to come from rt where kid1 = id1. how do I ensure this condition ? –  Jan 22 '18 at 11:47
  • If you want to fill in missing columns with the matching rt row information, you can use a very similar approach: `cols[rt]#0!(1!update `$id1, `$id2 from t)^`id1 xkey 0!rt uj rq` which uses fill instead of uj, which will fill in null values rather than overwriting. – Ryan McCarron Jan 22 '18 at 13:47
  • can you please explain the logic behind this line? It throws an error when I try –  Jan 23 '18 at 05:46
  • ```cols[rt]#0!(1!update `$id1, `$id2 from t)^id1 xkey 0!rt uj rq``` - I think there was a formatting issue. The problem was removing the backtick being used to cast to sym – Ryan McCarron Jan 23 '18 at 10:34
0

Are you looking for something like this?

q)(2!data1) uj `id1`id2 xkey update id1:kid1 from data2
id1   id2   | exc1 exc2 exc3 exc4 kid1  sf1 sf2 sf3 sf4
------------| -----------------------------------------
"aa2" "12ac"| 45   54   53   65   ""
"bb"  "23"  | 23   33   23   12   ""
"aa2" "ads2"| 45   54             "aa2" 55  6   55  66

The uj with two keyed tables will return a table with the union of the keys

In order to fall back to updating exc2 and exc3 from data1 if they're blank, using only id1 as a key, you can try something like this:

q)t:(2!data1) uj `id1`id2 xkey update id1:kid1 from data2   //same as before
q)(t lj 1!select id1,exc2,exc3 from data1)^t                //lj these fields on, use fill to only update null fields
id1   id2   | exc1 exc2 exc3 exc4 kid1  sf1 sf2 sf3 sf4
------------| -----------------------------------------
"aa2" "12ac"| 45   54   53   65   ""
"bb"  "23"  | 23   33   23   12   ""
"aa2" "ads2"| 45   54   53        "aa2" 55  6   55  66
q)cols[data1]#0!(t lj 1!select id1,exc2,exc3 from data1)^t  //use Ryan's suggestion for getting the cols you desire
id1   id2    exc1 exc2 exc3 exc4
--------------------------------
"aa2" "12ac" 45   54   53   65
"bb"  "23"   23   33   23   12
"aa2" "ads2" 45   54   53
Jonathon McMurray
  • 2,881
  • 1
  • 10
  • 22
  • Thanks for this. I also needed to incorporate this condition that exc2 and exc3 shouldn't be blank. Their values should be picked from that row in data1 where kid1 = id1. Can you please help me with this –  Jan 22 '18 at 11:59
  • Sure, I've updated my answer with a suggestion for doing this – Jonathon McMurray Jan 22 '18 at 12:40
  • the problem is, there are too many columns in my dataset. Is there a way to generalize this? –  Jan 22 '18 at 13:17