I'm trying to understand how to do segmentation on a set of time series data (daily stock prices, temperatures etc.) and came across a book that explains how to do the SWAB (sliding-window and bottom-up) segmentation algorithm, but I don't quite understand it. This segmentation is part of a sonification algorithm. The following text is from "Multimedia Data Mining and Analytics: Disruptive Innovation".
The SWAB segmentation algorithm gets four parameters—the input file (time series data), the output file (segmented data), the maximal error, and the indication of nominal attributes. After running a number of experiments on time series of different sizes with different values for the number of segments, we chose the appropriate default number of segments as follows. 25–50 % of time series size for time series with less than 100 observations, 20–35 % for time series with 100–200 observations, and 15–25 % for time series with more than 200 observations. If the user is not interested to use the default value for any reason, he can enter his own number of segments as a parameter to the algorithm. Starting with the default values for the minimum and the maximum error, we run the segmentation algorithm for the first time and get the minimum number of segments for a given time series (the higher the maximum error, the fewer segments will be found). Then we decrease the maximum error (and so increase the number of found segments) trying to narrow the upper and the lower bounds of error by dividing the base by powers of 2 (like in binary search). Every time after running the segmentation algorithm with the current maximal error, we test whether this value gives a better approximation for the optimal number of segments, and so is a better upper or lower bound for the optimal maximum error. If so, we advance the appropriate bound to this value. In the beginning, only the upper bound is affected. However, once we found the lower bound that provides more segments than the optimum, we continue to look for the optimal number of segments by smaller steps: the next maximum error is the mean between the current upper and lower bounds. As follows from our experience with many different time series databases, the optimal maximal error is usually found within 3–4 iterations. The convergence rate depends on the input time series database itself. If the algorithm has not converged within 20 iterations, we stop searching and proceed with the next sonification steps using the segments found at the 20th iteration.
So for example if I have time series data with 150 observations (which corresponds to 20-35% default number of segments) what are the exact steps I need to take to make the data segmented?
Any help at all is appreciated, thanks.