Matlab: How to remove cell elements which have other sets as subsets

Question

I have a cell with arrays listed inside:

C = {[1,2,3,4], [3,4], [2], [4,5,6], [4,5], [7]}

I want to output:

D = {[3,4], [2], [4,5], [7]}

Those sets in D are the only sets that contain any other sets in D in themselves.

Please reference the following link for a similar question. Although elegant, I was not able to modify the code (yet) to accommodate my particular question.

I would appreciate any help with a solution.

Thank you!

You really shouldn't link to your deleted pseudo-answer. Link to the actual question, or the proper answer you're referring to. — Mad Physicist, May 06 '20 at 00:13
Take time to attempt to adapt the elegant solution to your needs. Play with the rows and columns. Let me know if you run into problems at that point and I'll help. — Mad Physicist, May 06 '20 at 00:15

rahnema1 · Accepted Answer · 2020-05-09T00:19:02.707

As of the linked post you can form the matrix s that represents the number of similar elements between all pairs of sets. The result would be:

C = {[1,2,3,4], [3,4], [2], [4,5,6], [4,5], [7]};
n = cellfun(@numel,C);      % find length of each element.
v = repelem(1:numel(C),n);  % generate indices for rows of the binary matrix
[~,~,u] = unique([C{:}]);   % generate indices for rows of the binary matrix
b = accumarray([v(:),u(:)],ones(size(v)),[],@max,[],true); % generate the binary matrix
s = b * b.';                % multiply by its transpose
s(1:size(s,1)+1:end) = 0;   % set diagonal elements to 0(we do not need self similarity)
result=C(~any(n(:) == s)) ;

But the matrix may be very large so it is better to use a loop to avoid memory problems:

idx=false(1,numel(C));
for k =1:numel(C)
    idx(k) = ~any(n == full(s(k, :))) ;
end
result=C(idx) ;

Or follow a vectorized approach:

[r, c, v] = find(s) ;
idx = sub2ind(size(s), r, c) ;
s(idx) = v.' == n(r) ;
result = C(~any(s)) ;

Thank you! I will give this a shot this evening. Looks like there may be an issue with "s(idx) = v==(r);". What did you mean with this line? — dsmalenb, May 06 '20 at 10:52
@dsmalenb Sorry, I didn't notify your comment. I updated the answer. `v` should be a row vector. — rahnema1, May 09 '20 at 00:17

score 1 · Answer 2 · answered May 05 '20 at 23:31

You could simply do this by comparing each element with its next elements and see if any of the next elements are a subset of the current element and if so remove the larger element. Here is simple code that does what you are looking for:

C = {[1,2,3,4], [3,4], [2], [4,5,6], [4,5], [7]};
% Initialize D with a copy of C
D = C; 

% Compare each element i with other elements j = i+1, i+2, ....
for i = 1:numel(C)-1
    for j = i+1:numel(C)
        % Check to see if C{j} exists in C{i}
        if contains(num2str(C{i}),num2str(C{j}))
            % Make unwanted elements empty
            D{i} = [];
        end
    end
end

% Remove empty elements
D(cellfun(@isempty,D)) = [];

Matlab: How to remove cell elements which have other sets as subsets

2 Answers2