0

I have followed the tutorial available at: https://www.tensorflow.org/quantum/tutorials/mnist. I have modified this tutorial to the simplest example I could think of: an input set in which x increases linearly from 0 to 1 and y = x < 0.3. I then use a PQC with a single Rx gate with a symbol, and a readout using a Z gate.

When retrieving the optimized symbol and adjusting it manually, I can easily find a value that provides 100% accuracy, but when I let the Adam optimizer run, it converges to either always predict 1 or always predict -1. Does anybody spot what I do wrong? (and I apologize for not being able to break down the code to a smaller example)

import tensorflow as tf
import tensorflow_quantum as tfq

import cirq
import sympy
import numpy as np

# used to embed classical data in quantum circuits
def convert_to_circuit_cont(image):
    """Encode truncated classical image into quantum datapoint."""
    values = np.ndarray.flatten(image)
    qubits = cirq.GridQubit.rect(4, 1)
    circuit = cirq.Circuit()
    for i, value in enumerate(values):
        if value:
            circuit.append(cirq.rx(value).on(qubits[i]))
    return circuit

# define classical dataset
length = 1000
np.random.seed(42)

# create a linearly increasing set for x from 0 to 1 in 1/length steps
x_train_sorted = np.asarray([[x/length] for x in range(0,length)], dtype=np.float32)
# p is used to shuffle x and y similarly
p = np.random.permutation(len(x_train_sorted))
x_train = x_train_sorted[p]
# y = x < 0.3 in {-1, 1} for Hinge loss
y_train_sorted = np.asarray([1 if (x/length)<0.30 else -1 for x in range(0,length)])
y_train = y_train_sorted[p]
# test == train for this example
x_test = x_train_sorted[:]
y_test = y_train_sorted[:]

# convert classical data into quantum circuits
x_train_circ = [convert_to_circuit_cont(x) for x in x_train]
x_test_circ = [convert_to_circuit_cont(x) for x in x_test]
x_train_tfcirc = tfq.convert_to_tensor(x_train_circ)
x_test_tfcirc = tfq.convert_to_tensor(x_test_circ)

# define the PQC circuit, consisting out of 1 qubit with 1 gate (Rx) and 1 parameter
def create_quantum_model():
    data_qubits = cirq.GridQubit.rect(1, 1)  
    circuit = cirq.Circuit()
    a = sympy.Symbol("a")
    circuit.append(cirq.rx(a).on(data_qubits[0])),
    return circuit, cirq.Z(data_qubits[0])
model_circuit, model_readout = create_quantum_model()

# Build the Keras model.
model = tf.keras.Sequential([
    # The input is the data-circuit, encoded as a tf.string
    tf.keras.layers.Input(shape=(), dtype=tf.string),
    # The PQC layer returns the expected value of the readout gate, range [-1,1].
    tfq.layers.PQC(model_circuit, model_readout),
])

# used for logging progress during optimization
def hinge_accuracy(y_true, y_pred):
    y_true = tf.squeeze(y_true) > 0.0
    y_pred = tf.squeeze(y_pred) > 0.0
    result = tf.cast(y_true == y_pred, tf.float32)
return tf.reduce_mean(result)

# compile the model with Hinge loss and Adam, as done in the example. Have tried with various learning_rates
model.compile(
    loss = tf.keras.losses.Hinge(),
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.1),
    metrics=[hinge_accuracy])

EPOCHS = 20
BATCH_SIZE = 32
NUM_EXAMPLES = 1000

# fit the model
qnn_history = model.fit(
      x_train_tfcirc, y_train, 
      batch_size=32,
      epochs=EPOCHS,
      verbose=1,
      validation_data=(x_test_tfcirc, y_test),
      use_multiprocessing=False)

results = model.predict(x_test_tfcirc)
results_mapped = [-1 if x<=0 else 1 for x in results[:,0]]
print(np.sum(np.equal(results_mapped, y_test)))

After 20 epochs of optimization, I get the following:

1000/1000 [==============================] - 0s 410us/sample - loss: 0.5589 - hinge_accuracy: 0.6982 - val_loss: 0.5530 - val_hinge_accuracy: 0.7070

This results in 700 samples out of 1000 predicted correctly. When looking at the mapped results, this is because all results are predicted as -1. When looking at the raw results, they linearly increase from -0.5484014 to -0.99996257.

When retrieving the weight with w = model.layers[0].get_weights(), subtracting 0.8, and setting it again with model.layers[0].set_weights(w), I get 920/1000 correct. Fine-tuning this process allows me to achieve 1000/1000.

Update 1: I have also printed the update of the weight over the various epochs:

4.916246, 4.242602, 3.3765688, 2.6855211, 2.3405066, 2.206207, 2.1734586, 2.1656137, 2.1510274, 2.1634471, 2.1683235, 2.188944, 2.1510284, 2.1591303, 2.1632445, 2.1542525, 2.1677444, 2.1702878, 2.163104, 2.1635907

I set the weight to 1.36, a value which gives 908/1000 (as opposed to 700/100). The optimizer moves away from it:

1.7992111, 2.0727847, 2.1370323, 2.15711, 2.1686404, 2.1603785, 2.183334, 2.1563332, 2.156857, 2.169908, 2.1658351, 2.170673, 2.1575692, 2.1505954, 2.1561477, 2.1754034, 2.1545155, 2.1635509, 2.1464484, 2.1707492

One thing that I noticed is that the value for the hinge accuracy was 0.75 with the weight 1.36, which is higher than the 0.7 for 2.17. If this is the case, I am either in an unlucky part of the optimization landscape where the global minimum does not correspond to the minimum of the loss landscape, or the loss value is determined incorrectly. This is what I will be investigating next.

Thomas Hubregtsen
  • 449
  • 1
  • 5
  • 22

1 Answers1

0

The minima of the Hinge loss function for this examples does not correspond with the maxima of number of correctly classified examples. Please see plot of these w.r.t. the value of the parameter. Given that the optimizer works towards the minima of the loss, not the maxima of the number of classified examples, the code (and framework/optimizer) do what they are supposed to do. Alternatively, one could use a different loss function to try to find a better fit. For example binarized l1 loss. This function would have the same global optimum, but would likely have a very flat landscape.

loss values with regard to weights

Thomas Hubregtsen
  • 449
  • 1
  • 5
  • 22