Problem while implementating of “Continual Learning Through Synaptic Intelligence” paper

Question

I am trying to reproduce the results of “Continual Learning Through Synaptic Intelligence” paper [1]. I tried implementing the algorithm as best as I could understand after going through paper many times. I also looked at it’s official implementation on github which is in tensorflow 1.0, but could not understand much as I don’t have much familiarity with that. Though I got some results but not good enough as paper. I wanted to ask if anyone can help me to find out where I am going wrong. Before going into coding details I want to discuss sudo code so that I undersatnd what is going wrong with my implementation.

Here is kind of sudo code that I have implemented. Please help me.

lambda = 1
xi = 1e-3

total_tasks = 5

model = NN(total_tasks)
## multiheaded linear model ([784(input)-->256-->256-->2(output)(*5, 5 separate heads)])
## output layer is 2 neuron head (separate heads for each task, total 5 tasks)
## output is vector of size 2 (for 2 classes)

prev_theta = model.theta(copy=True) # updated at end of task
## model.theta() returns list of shared parameters (i.e. layer1 and layer2 excluding output layer)
## copy=True, gives copy of parameters
## so it don't effect original params connected to computaitonal graph

omega_total = zero_like(prev_theta) ## Capital Omega in paper (per-parameter regularization strength)
omega = zero_like(prev_theta) ## small omega in paper (per-parameter contribution to loss)

for task_num in range(total_tasks):
    optmizer = ADAM() # created before every task (or reset it)
    prev_theta_step = model.theta(copy=True) # updated at end of step
    
    ## trainig for task start
    for epoch in range(10):
        for steps in range(steps_per_epoch):
            X, Y = train_dataset[task_num].sample()
            ## X is flattened image of size 784
            ## Y is binary vector of size 2 ([0,1] or [1,0])

            Y_pred = model(X, task_num) # model is multihead, task_num selects the head
            loss = CROSS_ENTROPY(Y_pred, Y)

            if(task_num>0): ## reg_loss starts from second task
                theta = model.theta()
                ## here copy is not true so it returns params connected to computaitonal graph
                
                reg_loss = torch.sum(omega_total*torch.square(theta - prev_theta))

                loss = loss + lambda*reg_loss

            optmizer.zero_grad()
            loss.backward()

            theta = model.theta(copy=True)
            grads = model.theta_grads() ## grads of shared paramters only
            omega = omega - grads*(theta - prev_theta_step)
            prev_theta_step = theta

            optimizer.step()

    ## training for task complete, update importance parameters
    theta = model.theta(copy=True)
    omega_total += relu( omega/( (theta - prev_theta)**2 + xi) )
    prev_theta = theta
    omega = torch.zeros(theta_shape)

    ## evaluation code
    ...
    ...
    ...
    ## evaluation done

I am also attaching result I got. In results ‘one’ (blue) represents without regression loss (lambda=0), ‘two’ (green) represents with regression loss (lambda=1).

Thank you for reading so far. Kindly help me out.

Problem while implementating of “Continual Learning Through Synaptic Intelligence” paper

0 Answers0