2

I have a large table whose entries are

KEY_A,KEY_B,VAL

where KEY_A and KEY_B are finite sets of keys. For arguments sake, we'll have 4 different KEY_B values and 4 different KEY_A values. And example table:

KEY_A    KEY_B      KEY_C  
_____    _____    _________

1        1          0.45054
1        2         0.083821
1        3          0.22898
1        4          0.91334
2        1          0.15238
2        2          0.82582
2        3          0.53834
2        4          0.99613
3        1         0.078176
3        2          0.44268
3        3          0.10665
3        4           0.9619
4        1        0.0046342
4        2          0.77491
4        3           0.8173
4        4          0.86869
4        5          1

I want to elegantly flatten the table into

KEY_A     KEY_B_1     KEY_B_2     KEY_B_3    KEY_B_4    KEY_B_5
_____    _________    ________    _______    _______    _______

1          0.45054    0.083821    0.22898    0.91334    -1
2          0.15238     0.82582    0.53834    0.99613    -1
3         0.078176     0.44268    0.10665     0.9619    -1
4        0.0046342     0.77491     0.8173    0.86869    1

I'd like to be able to handle missing B values (set them to a default like -1), but I think if I get an elegant way to do this to start then such things will fall into place.

The actual table has millions of records, so I do want to use a vectorized call.

The line I've got (which doesn't handle int invalid 5) is:

cell2mat(arrayfun(@(x)[x,testtable{testtable.KEY_A==x,3}'],unique(testtable{:,1}),'UniformOutput',false))

But

  1. it doesn't output a different table
  2. If there are missing keys in the table, it doesn't handle that

I would think that this isn't that uncommon of an activity...has anyone done something like this before?

IdeaHat
  • 7,641
  • 1
  • 22
  • 53

1 Answers1

1

If the input table is T, then you could try this for the given case -

KEY_B_ =-1.*ones(max(T.KEY_A),max(T.KEY_B))
KEY_B_(sub2ind(size(KEY_B_),T.KEY_A,T.KEY_B)) = T.KEY_C
T1 = array2table(KEY_B_)

Output for the edited input -

T1 = 
     KEY_B_1     KEY_B_2     KEY_B_3    KEY_B_4    KEY_B_5
    _________    ________    _______    _______    _______
      0.45054    0.083821    0.22898    0.91334    -1     
      0.15238     0.82582    0.53834    0.99613    -1     
     0.078176     0.44268    0.10665     0.9619    -1     
    0.0046342     0.77491     0.8173    0.86869     1     

Edit by MadScienceDreams: This answer lead me to write the following function, which will smash together pretty much any table based on the input keys. Enjoy!

function [ OT ] = flatten_table( T,primary_keys,secondary_keys,value_key,default_value )
%UNTITLED Summary of this function goes here
%   Detailed explanation goes here
if nargin < 5
    default_value = {NaN};
end

if ~iscell(default_value)
    default_value={default_value};
end

if ~iscell(primary_keys)
    primary_keys={primary_keys};
end

if ~iscell(secondary_keys)
    secondary_keys={secondary_keys};
end

if ~iscell(value_key)
    value_key={value_key};
end
primary_key_values = unique(T(:,primary_keys));
num_primary = size(primary_key_values,1);
[primary_key_map,primary_key_map] = ismember(T(:,primary_keys),primary_key_values);

secondary_key_values = unique(T(:,secondary_keys));
num_secondary = size(secondary_key_values,1);
[secondary_key_map,secondary_key_map] = ismember(T(:,secondary_keys),secondary_key_values);

%out =-1.*ones(max(T.KEY_A),max(T.KEY_B))

try
    values = num2cell(T{:,value_key},2);
catch
    values = num2cell(table2cell(T(:,value_key)),2);
end

if (~iscell(values))
    values=num2cell(values);
end
OT=repmat(default_value,num_primary,num_secondary);
OT(sub2ind(size(OT),primary_key_map,secondary_key_map)) = values;
label_array = num2cell(cellfun(@(x,y)[x '_' mat2str(y)],...
  repmat (secondary_keys,size(secondary_key_values,1),1),...
  table2cell(secondary_key_values),'UniformOutput',false),1);
label_array = strcat(label_array{:});
OT = [primary_key_values,cell2table(OT,'VariableNames',label_array)];

end
IdeaHat
  • 7,641
  • 1
  • 22
  • 53
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • This is pretty good, it does output a table. A little tweaking can get it to output the correct table, with lables and the primary key. However, I think the devil seems to be in the "default" value for missing keys. I've edited the question to reflect that, but how would you modify your method to deal with that? (I think I could add the default entries before the reshape? – IdeaHat Sep 25 '14 at 15:21
  • Nice change....I'll be back in a sec with a generic function that isn't reliant on the keys being indexable values (simple to do) and if you plop that in I'll set you as the answer. – IdeaHat Sep 25 '14 at 15:23
  • @MadScienceDreams If you care to have the exact output as the desired one mentioned in the question, add this at the end - `KEY_A = unique(T.KEY_A); T1 = [array2table(KEY_A) T1]`. – Divakar Sep 25 '14 at 15:34
  • @MadScienceDreams Nice to see the code being wrapped into a function! Hope this is working out well for you and thanks for the edits. – Divakar Sep 25 '14 at 18:42