I tried to run both the situations you mentioned (in a script and within a function), and in both cases, I noticed that MATLAB uses multiple cores (4 in my case too), and so I cannot reproduce this behavior.
However, I believe one of the major reasons that MATLAB is slower than ITK for a sphere is that a 3D sphere is not decomposed into smaller simpler shapes.
SE = strel('disk', 12);
sum(SE.Neighborhood(:))
is 697, which is the number of 'on' pixels in the 3D sphere.
The cube on the other hand is decomposed, meaning:
SE = strel('cube', 25);
% The decompose method replaces one cube with three 3D lines,
% applied repeatedly over the volume.
seq = SE.decompose()
sum(seq(1).Neighborhood)
sum(seq(2).Neighborhood)
sum(seq(3).Neighborhood)
This reduces the number of comparisons to be made from 25 cubed to 3 times 25, which is probably what ITK does to optimize this.
I would be curious to know if you use a cube, like 'se = strel('cube',25);' instead. I'm fairly certain it is much faster. If that is true, you could investigate decomposing the sphere and using it instead, for example:
Vaz, M. S., Kiraly, A. P., & Mersereau, R. M. (2007). Multi-level decomposition of Euclidean spheres. In Proc. Int. Symp. Math. Morphology (ISMM) (pp. 461-472).