0

I have a dataset made up of around 50 csv files that each contain 2000-ish lines of 101 float values. The last value, the 101st, is the final value and the first 100 are a set of values that lead up to the final value.

I would like to create a dl4j project that would predict (I hope that is the correct terminology) a value based on being fed 100 values. Thus, dl4j would be able to analyze all datasets and give a prediction of the final value when a new dataset of 100 values is proposed.

Can anyone help me understand if this is feasible with dl4j and how I may go about doing it? For example, I'm not sure what type of network to use for this type of dataset and goal.

Can anyone suggest an sample code that may do something similar?

MBU
  • 401
  • 5
  • 16

1 Answers1

1

This answer expects some baseline knowledge of the framework. If you are unsure, please ensure to take a look at our quickstart and examples: https://deeplearning4j.konduit.ai/ We also have examples at https://github.com/deeplearning4j/deeplearning4j-examples - please ensure you take a look at those. Please feel free to ask questions about those as well.

What you're looking for is a configuration similar to:

  final int numHiddenNodes = 50;
        return new NeuralNetConfiguration.Builder()
                .seed(seed)
                .weightInit(WeightInit.XAVIER)
                .updater(new Nesterovs(learningRate, 0.9))
                .list()
                .layer(0, new DenseLayer.Builder().nIn(numInputs).nOut(numHiddenNodes)
                        .activation(Activation.TANH).build())
                .layer(1, new DenseLayer.Builder().nIn(numHiddenNodes).nOut(numHiddenNodes)
                        .activation(Activation.TANH).build())
                .layer(2, new OutputLayer.Builder(LossFunctions.LossFunction.MSE)
                        .activation(Activation.IDENTITY)
                        .nIn(numHiddenNodes).nOut(numOutputs).build())
                .build();

This will give you a neural network with an MSE loss function and an identity output. This is a good starting point ( no guarantees of accuracy, it's very much problem dependent) for regression.

Afterwards, ensure you setup a neural network data set iterator for regression.

You can prepare your input data using something like:

int numLinesToSkip = 0;
String fileDelimiter = ",";
RecordReader rr = new CSVRecordReader(numLinesToSkip,fileDelimiter);
String csvPath = "/path/to/my/file.csv";
rr.initialize(new FileSplit(new File(csvPath)));

int batchSize = 4;
RecordReaderDataSetIterator testIterator = new RecordReaderDataSetIterator.Builder(rr, batchSize)
.regression(3)
.build();

Note the regression method is for telling the dataset iterator to treat the value at column 3 as your label. You will need to change this to suit your problem.

Depending on your data you might need to normalize it. In that case ensure you also apply a normalizer:

NormalizerStandardize std = new NormalizerStandardize();
        std.fit(iter);
        iter.setPreProcessor(std);

This will fit a zero mean unit variance normalizer to your data.

Adam Gibson
  • 3,055
  • 1
  • 10
  • 12