I have solved it by getting the output of the previous layer of the multihead attention layer and passing it by the multihead attention:
atten_maps_hooks = [Model(inputs = model.input, outputs = model.layers[getLayerIndexByName(model, 'encoded_0') - 1].output),
Model(inputs = model.input, outputs = model.layers[getLayerIndexByName(model, 'encoded_1') - 1].output),
Model(inputs = model.input, outputs = model.layers[getLayerIndexByName(model, 'encoded_2') - 1].output),
Model(inputs = model.input, outputs = model.layers[getLayerIndexByName(model, 'encoded_3') - 1].output),
Model(inputs = model.input, outputs = model.layers[getLayerIndexByName(model, 'encoded_4') - 1].output),
Model(inputs = model.input, outputs = model.layers[getLayerIndexByName(model, 'encoded_5') - 1].output)]
for i in range(len(atten_maps_hooks)):
temp = atten_maps_hooks[i].predict(input)
mha, scores = model.get_layer('encoded_' + str(i))(temp, temp, return_attention_scores = True)
enc_atten_maps_hwhw.append(scores.numpy()[0].reshape(shape + shape))