DJL GradientCollector Try With Resources Initialiser Error

Question

I am trying to initialise two GradientCollectors for a card game AI. I am currently experiencing the following error:

java.lang.IllegalStateException: Autograd Recording is already set to True. Please create autograd using try with resource 
    at ai.djl.mxnet.engine.MxGradientCollector.<init>(MxGradientCollector.java:31)
    at ai.djl.mxnet.engine.MxEngine.newGradientCollector(MxEngine.java:144)
    at ai.djl.training.Trainer.newGradientCollector(Trainer.java:145)

The code that is causing this error (along with extra code for reference) is:

NDManager manager = NDManager.newBaseManager(Device.cpu());

NDArray inputArray = manager.create(new float[52]);

int numEpochs = Integer.getInteger("MAX_EPOCH", 10);

double[] epochCount = new double[numEpochs/5];

for(int i = 0; i < epochCount.length; i++) {
    epochCount[i] = (i + 1);
}

// Learning to Rank
Tracker selectionLrt = Tracker.fixed(0.5f);
Tracker destinationLrt = Tracker.fixed(0.5f);

// Stochastic gradient descent
Optimizer selectionSgd = Optimizer.sgd().setLearningRateTracker(selectionLrt).build();
Optimizer destinationSgd = Optimizer.sgd().setLearningRateTracker(destinationLrt).build();

// Loss function
Loss selectionLoss = Loss.softmaxCrossEntropyLoss();
Loss destinationLoss = Loss.softmaxCrossEntropyLoss();

DefaultTrainingConfig selectionConfig = new DefaultTrainingConfig(selectionLoss)
    .optOptimizer(selectionSgd) // Optimizer (loss function)
    .optDevices(Engine.getInstance().getDevices(0)) // single CPU
    .addEvaluator(new Accuracy()) // Model Accuracy
    .addTrainingListeners(TrainingListener.Defaults.logging()); // Logging

DefaultTrainingConfig destinationConfig = new DefaultTrainingConfig(destinationLoss)
    .optOptimizer(destinationSgd) // Optimizer (loss function)
    .optDevices(Engine.getInstance().getDevices(0)) // single CPU
    .addEvaluator(new Accuracy()) // Model Accuracy
    .addTrainingListeners(TrainingListener.Defaults.logging()); // Logging

try (Model selectionANN = Engine.getInstance().newModel("selectionANN", Device.cpu());
    Model destinationANN = Engine.getInstance().newModel("destinationANN", Device.cpu())) {

    selectionANN.setBlock(getBlock(true));
    destinationANN.setBlock(getBlock(false));


    try (Trainer selectionTrainer = selectionANN.newTrainer(selectionConfig);
        Trainer destinationTrainer = destinationANN.newTrainer(destinationConfig);
        GradientCollector selectionCollector = selectionTrainer.newGradientCollector();
        GradientCollector destinationCollector = destinationTrainer.newGradientCollector()) {

Kx13739240386 · Answer 1 · 2022-12-20T20:09:21.913

0

This is because currently DJL gradient collector is global and doesn't support multiple gradient collector coexisting. This is documented here https://github.com/deepjavalibrary/djl/pull/2111.

Is it possible to use single global gradient collector in your case? Usually, gradient collector is only used to invoke backward(). Thus it may be possible to use global gradient collector. If not, you can open an enhancement issue in DJL github repo.

edited Dec 20 '22 at 20:09

answered Dec 17 '22 at 04:21

Kx13739240386

31
4

1

Are you trying to train them in sequence or in parallel? It is more of a limitation on the engine side that the training modes are global. The engines are also largely designed to train a single model at a time with all resources, so sequence is the recommended approach (and also the approach that won’t get this error) – Kx13739240386 Dec 20 '22 at 20:09

DJL GradientCollector Try With Resources Initialiser Error

1 Answers1