1

I have a series of lines that I read from a file (over 2700) of this type:

A = '1; 23245675; -234567; 123456; ...; 0'

A is a string with ; as the delimiter for data.

To split the string I used the strsplit function first, but it was too slow to execute. Then I used regexp like this:

regexp(A,';','split')

Is there an even faster function than regexp?

Banghua Zhao
  • 1,518
  • 1
  • 14
  • 23
Sara Savio
  • 49
  • 1
  • 7
  • 2
    I don't get it, in your example `A` is a string or a cell array? And if it is a cell array, what are the strings you split? Those within `A` does not contain `;` at all. BTW, if your data is structured, consider `readtable` or `dlmread` as faster options to read it in a formatted way. See [here](https://stackoverflow.com/a/53486754/2627163). – EBH Nov 28 '18 at 21:01
  • A is a string. I have to split a string in more efficent way if it is possible. – Sara Savio Nov 28 '18 at 22:44
  • Have you considered `textscan`? [it is way faster than all the options you mention](https://stackoverflow.com/a/53534356/2627163) – EBH Nov 29 '18 at 10:38

2 Answers2

1

Being a builtin function1, textscan is probably the fastest option:

result = textscan(A{1},'%f','Delimiter',';');

Here is a little benchmark to show that:

A = repmat('1; 23245675; -234567; 123456; 0',1,100000); % a long string
regexp_time = timeit(@ () regexp(A,';','split'))
strsplit_time = timeit(@ () strsplit(A,';'))
split_time = timeit(@ () split(A,';'))
textscan_time = timeit(@ () textscan(A,'%f','Delimiter',';'))

the result:

regexp_time =
      0.33054
strsplit_time =
      0.45939
split_time =
      0.24722
textscan_time =
     0.057712

textscan is the fastest, and is ~4.3 times faster than the next method (split).

It is the fastest option no matter what is the length of the string to split (Note the log scale of the x-axis):

benchmark of string splitting


1"A built-in function is part of the MATLAB executable. MATLAB does not implement these functions in the MATLAB language. Although most built-in functions have a .m file associated with them, this file only supplies documentation for the function." (from the documentation)

EBH
  • 10,350
  • 3
  • 34
  • 59
0

The possible split function I can think about are regexp, strsplit, and split.

I compared the performance of them for a large string. The result shows split is slightly faster while strsplit is around 2 times slower than regexp.

Here is how I compared them:

First, create a large string A (around 16 million data) according to your question.

A = '1; 23245675; -234567; 123456; 0';
for ii=1:22
    A = strcat(A,A);
end

Option 1: regexp

tic
regexp(A,';','split');
toc

Elapsed time is 12.548295 seconds.

Option 2: strsplit

tic
strsplit(A,';');
toc

Elapsed time is 23.347392 seconds.

Option 3: split

tic
split(A,';');
toc

Elapsed time is 9.678433 seconds.

So split might help you speed up a little bit but it is not obvious.

Banghua Zhao
  • 1,518
  • 1
  • 14
  • 23