6

I'm new to MATLAB and I'm struggling to comprehend the subtleties between array-wise and element wise operations. I'm working with a large dataset and I've found the simplest methods aren't always the fastest. I have a very large Cell Array of strings, like in this simplified example:

% A vertical array of same-length strings
CellArrayOfStrings = {'aaa123'; 'bbb123'; 'ccc123'; 'ddd123'};

I'm trying to extract an array of substrings, for example:

'a1'
'b1'
'c1'
'd1'

I'm happy enough with an element-wise reference like this:

% Simple element-wise substring operation
MySubString = CellArrayOfStrings{2}(3:4);  % Expected result is 'b1'

But I can't work out the notation to reference them all in one go, like this:

% Desired result is 'a1','b1','c1','d1'
MyArrayOfSubStrings = CellArrayOfStrings{:}(3:4); % Incorrect notation!

I know that Matlab is capable of performing very fast array-wise operations, such as strcat, so I was hoping for a technique that works at a similar speed:

% An array-wise operation which works quickly
tic
speedTest = strcat(CellArrayOfStrings,'hello');
toc   % About 2 seconds on my machine with >500K array elements

All the for loops and functions which use behind-the-scenes iteration I have tried run too slowly with my dataset. Is there some array-wise notation that would do this? Would somebody be able to correct my understanding of element-wise and array-wise operations?! Many thanks!

fodfish
  • 85
  • 1
  • 4

4 Answers4

5

I can't work out the notation to reference them all in one go, like this:

MyArrayOfSubStrings = CellArrayOfStrings{:}(3:4); % Incorrect notation!

This is because curly braces ({}) return a comma-separated list, which is equivalent to writing the contents of these cells in the following way:

c{1}, c{2}, and so on....

When the subscript index refers to only one element, MATLAB's syntax allows to use parentheses (()) after the curly braces and further extract a sub-array (a substring in your case). However, this syntax is prohibited when the comma separated lists contains multiple items.

So what are the alternatives?

  1. Use a for loop:

    MyArrayOfSubStrings = char(zeros(numel(CellArrayOfStrings), 2));
    for k = 1:size(MyArrayOfSubStrings, 1)
        MyArrayOfSubStrings(k, :) = CellArrayOfStrings{k}(3:4);
    end
    
  2. Use cellfun (a slight variant of Dang Khoa's answer):

    MyArrayOfSubStrings = cellfun(@(x){x(3:4)}, CellArrayOfStrings);
    MyArrayOfSubStrings = vertcat(MyArrayOfSubStrings{:});
    
  3. If your original cell array contains strings of a fixed length, you can follow Dan's suggestion and convert the cell array into an array of strings (a matrix of characters), reshape it and extract the desired columns:

    MyArrayOfSubStrings =vertcat(CellArrayOfStrings{:});
    MyArrayOfSubStrings = MyArrayOfSubStrings(:, 3:4);
    
  4. Employ more complicated methods, such as regular expressions:

    MyArrayOfSubStrings = regexprep(CellArrayOfStrings, '^..(..).*', '$1');
    MyArrayOfSubStrings = vertcat(MyArrayOfSubStrings{:});
    

There are plenty solutions to pick from, just pick the one that fits you most :) I think that with MATLAB's JIT acceleration, a simple loop would be sufficient in most cases.

Also note that in all my suggestions the obtained cell array of substrings cell is converted into an array of strings (a matrix). This is just for the sake of the example; obviously you can keep the substrings stored in a cell array, should you decide so.

Community
  • 1
  • 1
Eitan T
  • 32,660
  • 14
  • 72
  • 109
  • 1
    thank you for your comprehensive reply, which both answered my question and helped my understanding. In the end I chose option 3, which seemed the best option for my dataset and function: I found that using a For loop inside my function (option 1) was about 4x slower than calling the function with cellfun (option 2). I chose option 3 because I didn't want to have to explain cellfun to the other guys who will be using this :). Thanks also to Dan and Moshen who provided similar answers. – fodfish Oct 14 '13 at 07:44
  • Cool. So from (1) direct access to row,col, substring within a 2d cell array produced by CellArray=textscan(fid,format) where col is a text column would be CellArray{col}{row}(3:4). – Dave X May 21 '15 at 18:34
4

cellfun operates on every element of a cell array, so you could do something like this:

>> CellArrayOfStrings = {'aaa123'; 'bbb123'; 'ccc123'; 'ddd123'};
>> MyArrayofSubstrings = cellfun(@(str) str(3:4), CellArrayOfStrings, 'UniformOutput', false)
MyArrayofSubstrings = 
    'a1'
    'b1'
    'c1'
    'd1'

If you wanted a matrix of strings instead of a cell array whose elements are the strings, use char on MyArrayOfSubstrings. Note that this is only allowed when each string is the same length.

Dang Khoa
  • 5,693
  • 8
  • 51
  • 80
1

You can do this:

C = {'aaa123'; 'bbb123'; 'ccc123'; 'ddd123'}
t = reshape([C{:}], 6, [])'
t(:, 3:4)

But only if your strings are all of equal length I'm afraid.

Dan
  • 45,079
  • 17
  • 88
  • 157
1

You can use char to convert them to a character array, do the indexing and convert it back to cell array

A = char(CellArrayOfStrings);
B = cellstr(A(:,3:4));

Note that if strings are of different lengths, char pads them with spaces at the end to create the array. Therefore if you index for a column that is beyond the length of one of the short strings you may receive some space characters.

Mohsen Nosratinia
  • 9,844
  • 1
  • 27
  • 52