I have two big matrices. Matrix A which is [4144514 x 3] and matrix B which is [51962 x 17].
The first three columns of A and B have the identifiers. On matrix B the 3 columns make a unique identifier, but this might be repeated on A.
I want to merge the two matrices, in such a away that the resulting matrix A, which should be [4144514 x 20], i.e. I am merging the 17 columns of B with matrix A given the criteria on the first three columns of each matrix.
This is the loop I am doing:
for i=1:size(B,1)
aux = sum(A(:,1)==B(i,1) & A(:,2)==B(i,2) & A(:,3) == B(i,3));
A(A(:,1)==B(i,1) & A(:,2)==B(i,2) & A(:,3) == B(i,3),4:20) = repmat(B(i,:),aux,1);
end
The variable aux
tells me how many lines in A
match the criteria from B
.
The second line of the loop, creates a matrix with the columns of B
repeated for aux
number of times, and puts them on the correct place in A
. However this is extremely inefficient.
Let me give you a toy example below with smaller matrices. I will have A to be [108 x 3] and B to be [27 x 17].
What I am looking to do is the following:
A = [100, 1, 2000 ;100, 1, 2000 ;100, 1, 2000 ;100, 1, 2000 ;100, 2, 2000 ;100, 2, 2000 ;100, 2, 2000 ;100, 2, 2000 ;100, 3, 2000 ;100, 3, 2000 ;100, 3, 2000 ;100, 3, 2000 ;100, 1, 2001 ;100, 1, 2001 ;100, 1, 2001 ;100, 1, 2001 ;100, 2, 2001 ;100, 2, 2001 ;100, 2, 2001 ;100, 2, 2001 ;100, 3, 2001 ;100, 3, 2001 ;100, 3, 2001 ;100, 3, 2001 ;100, 1, 2002 ;100, 1, 2002 ;100, 1, 2002 ;100, 1, 2002 ;100, 2, 2002 ;100, 2, 2002 ;100, 2, 2002 ;100, 2, 2002 ;100, 3, 2002 ;100, 3, 2002 ;100, 3, 2002 ;100, 3, 2002 ;101, 1, 2000 ;101, 1, 2000 ;101, 1, 2000 ;101, 1, 2000 ;101, 2, 2000 ;101, 2, 2000 ;101, 2, 2000 ;101, 2, 2000 ;101, 3, 2000 ;101, 3, 2000 ;101, 3, 2000 ;101, 3, 2000 ;101, 1, 2001 ;101, 1, 2001 ;101, 1, 2001 ;101, 1, 2001 ;101, 2, 2001 ;101, 2, 2001 ;101, 2, 2001 ;101, 2, 2001 ;101, 3, 2001 ;101, 3, 2001 ;101, 3, 2001 ;101, 3, 2001 ;101, 1, 2002 ;101, 1, 2002 ;101, 1, 2002 ;101, 1, 2002 ;101, 2, 2002 ;101, 2, 2002 ;101, 2, 2002 ;101, 2, 2002 ;101, 3, 2002 ;101, 3, 2002 ;101, 3, 2002 ;101, 3, 2002 ;103, 1, 2000 ;103, 1, 2000 ;103, 1, 2000 ;103, 1, 2000 ;103, 2, 2000 ;103, 2, 2000 ;103, 2, 2000 ;103, 2, 2000 ;103, 3, 2000 ;103, 3, 2000 ;103, 3, 2000 ;103, 3, 2000 ;103, 1, 2001 ;103, 1, 2001 ;103, 1, 2001 ;103, 1, 2001 ;103, 2, 2001 ;103, 2, 2001 ;103, 2, 2001 ;103, 2, 2001 ;103, 3, 2001 ;103, 3, 2001 ;103, 3, 2001 ;103, 3, 2001 ;103, 1, 2002 ;103, 1, 2002 ;103, 1, 2002 ;103, 1, 2002 ;103, 2, 2002 ;103, 2, 2002 ;103, 2, 2002 ;103, 2, 2002 ;103, 3, 2002 ;103, 3, 2002 ;103, 3, 2002 ;103, 3, 2002];
B = [100, 1, 2000, 8, 7, 9, 10, 1, 2, 9, 2, 1, 3, 3, 3, 9, 7; 100, 2, 2000, 8, 2, 7, 2, 7, 5, 5, 9, 2, 7, 1, 2, 6, 4; 100, 3, 2000, 8, 8, 7, 3, 2, 8, 1, 10, 9, 8, 6, 1, 5, 7; 100, 1, 2001, 10, 10, 1, 7, 2, 5, 5, 8, 6, 5, 3, 6, 6, 4; 100, 2, 2001, 6, 7, 3, 1, 5, 3, 9, 9, 3, 8, 1, 6, 4, 4; 100, 3, 2001, 1, 5, 7, 1, 5, 4, 2, 10, 5, 4, 6, 5, 1, 10; 100, 1, 2002, 7, 4, 6, 4, 7, 8, 3, 7, 7, 8, 2, 1, 6, 2; 100, 2, 2002, 8, 7, 8, 10, 2, 10, 8, 4, 7, 5, 10, 4, 4, 2; 100, 3, 2002, 4, 9, 4, 10, 2, 4, 2, 1, 4, 10, 9, 2, 6, 9; 101, 1, 2000, 5, 9, 5, 3, 10, 1, 4, 2, 10, 2, 6, 8, 5, 4; 101, 2, 2000, 6, 1, 8, 10, 10, 7, 4, 6, 5, 2, 8, 3, 2, 1; 101, 3, 2000, 7, 7, 5, 6, 2, 8, 8, 8, 1, 6, 1, 1, 9, 7; 101, 1, 2001, 8, 4, 5, 5, 8, 7, 2, 2, 9, 8, 1, 4, 2, 1; 101, 2, 2001, 3, 7, 10, 4, 9, 9, 1, 1, 10, 7, 6, 5, 10, 9; 101, 3, 2001, 10, 8, 2, 4, 6, 1, 6, 4, 8, 10, 7, 9, 4, 7; 101, 1, 2002, 6, 9, 3, 2, 10, 8, 5, 2, 6, 9, 1, 3, 6, 6; 101, 2, 2002, 8, 3, 8, 4, 4, 3, 4, 2, 4, 7, 2, 10, 3, 3; 101, 3, 2002, 10, 2, 10, 10, 6, 5, 3, 5, 10, 1, 3, 4, 8, 5; 103, 1, 2000, 8, 3, 9, 9, 4, 3, 3, 9, 7, 7, 6, 5, 2, 6; 103, 2, 2000, 6, 7, 5, 5, 7, 10, 5, 3, 5, 4, 7, 8, 9, 7; 103, 3, 2000, 3, 4, 9, 10, 3, 10, 7, 2, 10, 3, 3, 3, 6, 6; 103, 1, 2001, 7, 7, 1, 7, 10, 7, 10, 7, 9, 8, 4, 7, 6, 2; 103, 2, 2001, 5, 5, 4, 3, 7, 7, 6, 5, 2, 5, 5, 6, 6, 5; 103, 3, 2001, 6, 3, 9, 9, 2, 10, 10, 10, 10, 7, 10, 9, 9, 8; 103, 1, 2002, 5, 10, 2, 8, 6, 5, 7, 6, 4, 3, 6, 8, 7, 4; 103, 2, 2002, 10, 7, 6, 3, 10, 4, 5, 5, 1, 3, 1, 9, 1, 5; 103, 3, 2002, 2, 1, 5, 5, 2, 8, 6, 2, 6, 6, 10, 1, 4, 9];
for i=1:size(B,1)
aux = sum(A(:,1)==B(i,1) & A(:,2)==B(i,2) & A(:,3) == B(i,3));
A(A(:,1)==B(i,1) & A(:,2)==B(i,2) & A(:,3) == B(i,3),4:20) = repmat(B(i,:),aux,1);
end
With this smaller example the code runs very fast. But as soon as the matrices get as big as the ones I have, it takes ages. Is there any faster way to do this?