1

I am working on a code analyzing a periodic, noisy signal. The signal is given in my example as a CSV file containing about 382 periods of the signal in about 25000 bins which is loaded into my class. I split the data into subsets representing one peak each and apply a GaussianCurveFitter to each of the data sets.In specific cases - I guess if the data given to the Fitter is crappy and very much non-gaussian - the CurveFitter simply goes into a deadlock or infinite loop and never comes back to life. I do not understand why.

Here the code:

package de.gsi.sdbe.playground.wgeithner.BpmFitTest.main;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Scanner;
import org.apache.commons.math3.fitting.GaussianCurveFitter;
import org.apache.commons.math3.fitting.WeightedObservedPoints;
import org.apache.commons.math3.stat.descriptive.DescriptiveStatistics;
import de.gsi.chart.data.spi.DefaultDataSet;

public class TemplateAppModel {

final private WeightedObservedPoints _fitDataSet = new WeightedObservedPoints(); // datasets for curve fitting
private final ArrayList<Double>      _rawDataY   = new ArrayList<>();

public DefaultDataSet loadCsvFile() {

    Scanner fileScanner = null;
    try {
        fileScanner = new Scanner(new File("src/main/resources/YR06DX1HSignal.csv"));
    } catch (final FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

    final DefaultDataSet dataSet = new DefaultDataSet("Signal"); // dataset for visualization

    while (fileScanner.hasNextLine()) {
        final String lineText = fileScanner.nextLine();
        final String[] xyItems = lineText.split(",");
        try {
            dataSet.add(Double.valueOf(xyItems[0]), Double.valueOf(xyItems[1]));
            _rawDataY.add(Double.valueOf(xyItems[1]));
        } catch (final NumberFormatException ex) {
            // DO NOTHING
        }

    }

    fileScanner.close();

    return dataSet;
}

public ArrayList<Double> doBoxedPeakFitting(final Double binLengthIn, final Double rfFrequencyIn) {

    final ArrayList<Double> fitSummary = new ArrayList<>();
    final Double binLength = 8e-9; // in nanoseconds
    final Double rfFrequency = 1907440.2; // in Hz

    final Double dataSetTimeLength = _rawDataY.size() * binLength;
    final int peakCount = (int) Math.round(dataSetTimeLength * rfFrequency);

    // move a box over the data set an perform a fit for each boxed peak
    final int peakBoxSize = _rawDataY.size() / peakCount;

    for (int peakBoxCounter = 0; peakBoxCounter < peakCount; peakBoxCounter++) {

        final int maxRawIndex = peakBoxCounter * peakBoxSize + peakBoxSize;

        final List<Double> fitWindow = _rawDataY.subList(peakBoxCounter * peakBoxSize, maxRawIndex);

        final Double minimum = Collections.min(fitWindow);

        System.out.println("Minimum: " + minimum);

        final List<Double> fitWindowNormalized = new ArrayList<>();
        // add an offset to data to get everything into positive
        for (final Double item : fitWindow) {
            fitWindowNormalized.add(item - minimum);
        }

        // perform the actual fit
        final double[] fitResult = doFit(fitWindowNormalized, peakBoxCounter);

        if (fitResult.length == 3) {
            fitSummary.add(fitResult[2]);
        }
    }

    return fitSummary;
}

private double[] doFit(final List<Double> dataSet, final int fitIndex) {

    final WeightedObservedPoints fitData = new WeightedObservedPoints();
    double xValue = 0.0;

    // map the input to a format Apache GaussianCurveFitter expects
    for (final double yValue : dataSet) {
        fitData.add(xValue, yValue);
        xValue++;

    }

    GaussianCurveFitter theFitter = GaussianCurveFitter.create();
    theFitter.withMaxIterations(1);

    try {
        final double[] fitResult = theFitter.fit(fitData.toList());
        theFitter = null;
        return fitResult;
    } catch (final Exception ex) {
        System.out.println(fitIndex);
        ex.printStackTrace();

        for (final Double value : dataSet) {
            System.out.println(value);
        }

    }

    return new double[1];
}

public static double getMinValue(final double[] numbers) {
    double minValue = numbers[0];
    for (int i = 1; i < numbers.length; i++) {
        if (numbers[i] < minValue) {
            minValue = numbers[i];
        }
    }
    return minValue;
}

public static void main(final String[] args) {
    final TemplateAppModel model = new TemplateAppModel();
    model.loadCsvFile();
    final ArrayList<Double> result = model.doBoxedPeakFitting(null, null);

    final DescriptiveStatistics stats = new DescriptiveStatistics();
    result.stream().forEach(item -> {
        stats.addValue(item.doubleValue());
    });

    System.out.println("Mean: " + stats.getMean());

}

}

Is this a bug in the Apache library or am I doing something wrong? If needed I can supply the CSV data file too...

WolfiG
  • 1,059
  • 14
  • 31
  • The CSV could really help, assuming that it's then possible to simply copy+paste the code, run it in debug mode, and press "pause" at some point to see where it's stuck. (You could also do that, but ... maybe there's more behind that...) – Marco13 Apr 11 '19 at 16:37
  • Hi Marco, do you have a hint where to put the CSV? – WolfiG Apr 11 '19 at 17:49
  • Uploaded data file to google drive:https://drive.google.com/file/d/1xojpk6Jbg8SysRtYywIhCeiCOzSiy8be/view?usp=sharing – WolfiG Apr 12 '19 at 12:00
  • It's not entirely clear at which "level" the question should be answered. First of all: The line `theFitter.withMaxIterations(1);` does not make sense - was this a "debugging/workaround attempt"? The method returns a **new** fitter, so it should likely be `theFitter = theFitter.withMaxIterations(1000);` or so. Then, the fitting will throw an exception, because it does not converge in 1000 iterations. Answering the question of *why* it does not converge may not be "programming related" in the strictest sense, but maybe one could figure it out here nevertheless... – Marco13 Apr 12 '19 at 15:09
  • Hi Marco, your comment is right and your assumption concerning debigging too. I simply forgot to remove it before posting my question. Concerning the question as such: I am not clear if the deadlock(/infinite loop) is a bug in the Apache framework or if I am doing something wrong. As I didn't find an error in my code, which is running properly for about 60 loops but ceases to come out of the fitting routine with loop 61 (based on the CSV-file liked) – WolfiG Apr 14 '19 at 15:53
  • Again, in order to make a more profound statement here, one might have to examine the fitter implementation more closely. But I have created an image https://i.stack.imgur.com/KWiFd.png that to some extent supports a wild guess: The left shows the cases where it worked. The right shows the cases that failed. I assume that these curve shapes simply do not have a "valid" representation as a gaussian curve in the sense of the fitter (note that they all start high and then decrease). You might consider a more generic fitter than the gaussian one (i.e. one for a `ParametricUnivariateFunction`) – Marco13 Apr 14 '19 at 17:33
  • Thanks Marco13. This supports my assumption. In the mean time I found a fitter (custom from a colleague) that does the job. Futhermore I adjusted the box window sliding across the data, so I make sure that the peak/gaussian is always in the window. – WolfiG Apr 15 '19 at 15:06

0 Answers0