In Encog 3.x, how do you normalize data, use it for training, and denormalize results?
There is no good documentation on this and a simple example that applies each of these would go a long way in reducing the learning curve on Encog. I haven't figured it all out yet, but here are some resources.
(1) *How does Encog 3.0 Normalize?*
This code is ok for saving a new normalized csv. It is not clear here though how to take the AnalystNormalizeCSV and convert it to an MLDataSet to actually use it.
EncogAnalyst analyst = new EncogAnalyst();
AnalystWizard wizard = new AnalystWizard(analyst);
wizard.wizard(sourceFile, true, AnalystFileFormat.DECPNT_COMMA);
final AnalystNormalizeCSV norm = new AnalystNormalizeCSV();
norm.analyze(sourceFile, true, CSVFormat.ENGLISH, analyst);
norm.setOutputFormat(CSVFormat.ENGLISH);
norm.setProduceOutputHeaders(true);
norm.normalize(targetFile)
(2) *How do I normalize a CSV file with Encog (Java)*
This code is, again, ok for producing a normalized csv output. But it is unclear on how to take the normalized data and actually apply it. There is a method for setting the target as an MLData, but it assumes all columns are inputs and doesn't leave room for any ideals. Furthermore, both of these options are difficult to use when the file has headers and/or unused columns.
try {
File rawFile = new File(MYDIR, "iris.csv");
// download Iris data from UCI
if (rawFile.exists()) {
System.out.println("Data already downloaded to: " + rawFile.getPath());
} else {
System.out.println("Downloading iris data to: " + rawFile.getPath());
BotUtil.downloadPage(new URL("http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"), rawFile);
}
// define the format of the iris data
DataNormalization norm = new DataNormalization();
InputField inputSepalLength, inputSepalWidth, inputPetalLength, inputPetalWidth;
InputFieldCSVText inputClass;
norm.addInputField(inputSepalLength = new InputFieldCSV(true, rawFile, 0));
norm.addInputField(inputSepalWidth = new InputFieldCSV(true, rawFile, 1));
norm.addInputField(inputPetalLength = new InputFieldCSV(true, rawFile, 2));
norm.addInputField(inputPetalWidth = new InputFieldCSV(true, rawFile, 3));
norm.addInputField(inputClass = new InputFieldCSVText(true, rawFile, 4));
inputClass.addMapping("Iris-setosa");
inputClass.addMapping("Iris-versicolor");
inputClass.addMapping("Iris-virginica");
// define how we should normalize
norm.addOutputField(new OutputFieldRangeMapped(inputSepalLength, 0, 1));
norm.addOutputField(new OutputFieldRangeMapped(inputSepalWidth, 0, 1));
norm.addOutputField(new OutputFieldRangeMapped(inputPetalLength, 0, 1));
norm.addOutputField(new OutputFieldRangeMapped(inputPetalWidth, 0, 1));
norm.addOutputField(new OutputOneOf(inputClass, 1, 0));
// define where the output should go
File outputFile = new File(MYDIR, "iris_normalized.csv");
norm.setCSVFormat(CSVFormat.ENGLISH);
norm.setTarget(new NormalizationStorageCSV(CSVFormat.ENGLISH, outputFile));
// process
norm.setReport(new ConsoleStatusReportable());
norm.process();
System.out.println("Output written to: " + rawFile.getPath());
} catch (Exception ex) {
ex.printStackTrace();
}
(3) *Denormalizing*
I'm at a total loss for how to take all of this and denormalize according to the appropriate data-type's max's and min's.