Finding removed elements in a vector in matlab

Question

I have a process that is iteratively and randomly pruning a huge vector of integers and I want to find what elements are removed between each iteration. This vector has a lot of repetitions and using ismember() and setdiff() doesn't helped me much.

As an illustration if X = [1,10,8,5,10,3,5,2]:

step 0: X = 1,10,8,5,10,3,5,2
step 1: X = 1,10,8,10,3,5,2 (5 is removed)
step 2: X = 1,10,8,3,2 (10 and 5 are removed)
step 3: X = 10,8,3,2 (1 is removed)
step 4: X = 2 (10, 8 and 3 are removed)
step 5: X = [] (2 is finally removed)

I aim at finding the elements removed at each steps (ie. 5 then, 10 and 5 and so on). I could possibly find an overly complicated solution using hist(X, unique(X)) between steps, but I assume there exists a much more elegant (and cheaper!) solution in matlab.

How large is `X` and how many unique elements does it have, typically? — Luis Mendo, Mar 12 '19 at 18:28
Wouldn't it be simpler to have this process (a function I presume) also return the removed element? — Cris Luengo, Mar 12 '19 at 18:36
@LuisMendo X typically contains hundreds and at most a thousands and yes values are always positive. — Grasshoper, Mar 12 '19 at 21:59
@CrisLuengo sure it would, the issue is that the process generating X values takes a huge time to compute and I am processing the results tight now. — Grasshoper, Mar 12 '19 at 21:59
@Grasshoper I don't understand your last comment about the process generating the X values. Can you clarify? — beaker, Mar 12 '19 at 22:01
And if `X` only contains a few thousand elements, wouldn't `find` be quick enough? — beaker, Mar 12 '19 at 22:17
@beaker The process that generates the X values is the pruning of the elements of a huge structure according to some strategy and constraints. Not sure how to use find() in the given context, do you suggest to use a loop and counts the frequency of each element? That is building a histogram. — Grasshoper, Mar 13 '19 at 08:09
@Grasshoper No, I wasn't suggesting a histogram approach. Given two arrays `X` and `Y`, find the first element where `X ~= Y`. This is the first removed element. Now apply to the remaining arrays *after* the mismatch. The solution is *O(kn)*, where `k` is the number of removed elements and `n` is the length of `X`. However that's probably slower and more coding than the `histc(unique)` solution suggested [here](https://stackoverflow.com/questions/51829635/finding-multiset-difference-between-two-arrays). — beaker, Mar 13 '19 at 16:26

score 3 · Answer 1 · answered Mar 13 '19 at 06:47

I came up with the idea to recover the input from the output by subtracting both and iterating the differing values, which then are the to be found indices of the removed elements.

% Input.
X = [1, 10, 8, 5, 10, 3, 5, 2];

% Remove indices for the given example.
y = { [4], [4 6], [1], [1 2 3], [1] };

% Simulate removing.
for k = 1:numel(y)

  % Remove elements.
  temp = X;
  temp(y{k}) = [];

  % Determine number of removed elements.
  nRemoved = numel(X) - numel(temp);

  % Find removed elements by recovering input from output.
  recover = temp;
  removed = zeros(1, nRemoved);
  for l = 1:nRemoved
    tempdiff = X - [recover zeros(1, nRemoved - l + 1)];
    idx = find(tempdiff, 1);
    removed(l) = X(idx);
    recover = [recover(1:idx-1) X(idx) recover(idx:end)];
  end

  % Simple, stupid output.
  disp('Input:');
  disp(X);
  disp('');
  disp('Output:');
  disp(temp);
  disp('');
  disp('Removed elements:');
  disp(removed);
  disp('');
  disp('------------------------------');

  % Reset input.
  X = temp;

end

Output for the given example:

Input:
    1   10    8    5   10    3    5    2

Output:
    1   10    8   10    3    5    2

Removed elements:
 5

------------------------------
Input:
    1   10    8   10    3    5    2

Output:
    1   10    8    3    2

Removed elements:
   10    5

------------------------------
Input:
    1   10    8    3    2

Output:
   10    8    3    2

Removed elements:
 1

------------------------------
Input:
   10    8    3    2

Output:
 2

Removed elements:
   10    8    3

------------------------------
Input:
 2

Output:
[](1x0)

Removed elements:
 2

------------------------------

Is that an appropriate solution, or am I missing some (obvious) inefficiencies?

Thank you. The solution seem correct but the solution provided by @Luis-mendo would seem more efficient. — Grasshoper, Mar 14 '19 at 08:48

Luis Mendo · Accepted Answer · 2019-03-13T09:31:52.477

This approach is memory-intensive. It computes an intermediate matrix of size NxM where N is the number of elements of X and M is the number of unique elements of X, using implicit expansion. This may be feasible or not depending on your typical N and M.
```
X = [1,10,8,5,10,3,5,2];
Y = [8,10,2,1]; % removed 10, 5, 5, 3. Order in Y is arbitrary
u = unique(X(:).');
removed = repelem(u, sum(X(:)==u,1)-sum(Y(:)==u,1));
```
gives
```
removed =
     3     5     5    10
```
For Matlab versions before R2016b, you need bsxfun instead of implicit expansion:
```
removed = repelem(u, sum(bsxfun(@eq,X(:),u),1)-sum(bsxfun(@eq,Y(:),u),1));
```
If the values in X are always positive integers, a more efficient approach can be used, employing sparse to compute the number of times each element appears:
```
X = [1,10,8,5,10,3,5,2];
Y = [8,10,2,1]; % removed 10, 5, 5, 3. Order in Y is arbitrary
removed = repelem(1:max(X), sparse(1,X,1) - sparse(1,Y,1));
```

This works fine and is neat! Thank you! – Grasshoper Mar 14 '19 at 08:57 — Grasshoper, Mar 14 '19 at 08:57

Finding removed elements in a vector in matlab

2 Answers2