The true reason behind scaling features in SVM is the fact, that this classifier is not affine transformation invariant. In other words, if you multiply one feature by a 1000 than a solution given by SVM will be completely different. It has nearly nothing to do with the underlying optimization techniques (although they are affected by these scales problems, they should still converge to global optimum).
Consider an example: you have man and a woman, encoded by their sex and height (two features). Let us assume a very simple case with such data:
0 -> man
1 -> woman
╔═════╦════════╗
║ sex ║ height ║
╠═════╬════════╣
║ 1 ║ 150 ║
╠═════╬════════╣
║ 1 ║ 160 ║
╠═════╬════════╣
║ 1 ║ 170 ║
╠═════╬════════╣
║ 0 ║ 180 ║
╠═════╬════════╣
║ 0 ║ 190 ║
╠═════╬════════╣
║ 0 ║ 200 ║
╚═════╩════════╝
And let us do something silly. Train it to predict the sex of the person, so we are trying to learn f(x,y)=x (ignoring second parameter).
It is easy to see, that for such data largest margin classifier will "cut" the plane horizontally somewhere around height "175", so once we get new sample "0 178" (a woman of 178cm height) we get the classification that she is a man.
However, if we scale down everything to [0,1] we get sth like
╔═════╦════════╗
║ sex ║ height ║
╠═════╬════════╣
║ 1 ║ 0.0 ║
╠═════╬════════╣
║ 1 ║ 0.2 ║
╠═════╬════════╣
║ 1 ║ 0.4 ║
╠═════╬════════╣
║ 0 ║ 0.6 ║
╠═════╬════════╣
║ 0 ║ 0.8 ║
╠═════╬════════╣
║ 0 ║ 1.0 ║
╚═════╩════════╝
and now largest margin classifier "cuts" the plane nearly vertically (as expected) and so given new sample "0 178" which is also scaled to around "0 0.56" we get that it is a woman (correct!)
So in general - scaling ensures that just because some features are big it won't lead to using them as a main predictor.