Automatic Differentiation: PyTorch vs. Tensorflow

Question

I am creating a simple function that simulates N paths of Geometric Brownian Motions (GBM) with M discretization (M+1 if you include the starting point).

My function is the returning the values of the GBM at the last step (but I simulate the entire path as I need this in the future).

I want to get the derivative of the output vector (array) wrt. the inputs spots and vol using Automatic Differentiation (autograd, algorithm differentiation, AAD, etc.) . That is a Jacobian J of size Nx2.

I've tried to implement the function very similar for both PyTorch and TensorFlow to compare the performance of the two as seen below:

PyTorch:

import torch
from torch.autograd.functional import jacobian


def sim_gbm(N, t, spot, drift, vol, seed=None):
    M = int(t.shape[0])
    dt = torch.diff(t)

    Z = torch.normal(mean=0.0, std=1.0, size=(M - 1, N), generator=torch.manual_seed(seed))

    W = torch.concatenate([
        torch.zeros(size=(1, N)), torch.sqrt(dt)[:, None] * Z
    ]).cumsum(axis=0)

    S = spot * torch.exp(((drift - 0.5 * vol ** 2) * t)[:, None] + vol * W)
    return S[-1,]


def gbm_wrapper(spot, vol):
    return sim_gbm(N, t, spot, drift, vol, seed)


if __name__ == '__main__':
    from datetime import datetime
    seed = 1234
    N = 10000
    M = 52
    t0 = 0.0
    T = 1.0
    spot = torch.tensor(100.0, requires_grad=True)
    drift = torch.tensor(0.03)
    vol = torch.tensor(0.2, requires_grad=True)
    t = torch.linspace(t0, T, M + 1)

    start = datetime.now()

    J = jacobian(func=gbm_wrapper, inputs=(spot, vol))
    stop = datetime.now()
    print(stop - start)

TensorFlow:

import tensorflow as tf

def sim_gbm(N, t, spot, drift, vol, seed=None):
    M = tf.constant(t.shape[0])
    dt = tf.math.subtract(t[1:], t[:-1])

    Z = tf.random.normal(mean=0.0, stddev=1.0, shape=(M - 1, N), seed=seed)

    W = tf.concat([
        tf.zeros(shape=(1, N)), tf.sqrt(dt)[:, None] * Z
    ], axis=0)

    S = spot * tf.exp(((drift - 0.5 * vol ** 2) * t)[:, None] + vol * W)
    return S[-1,]


if __name__ == '__main__':
    from datetime import datetime
    seed = tf.constant(1234)
    N = tf.constant(10000)
    M = 52
    t0 = tf.constant(0.0)
    T = tf.constant(1.0)
    spot = tf.constant(100.0, dtype=tf.float32)
    drift = tf.constant(0.03, dtype=tf.float32)
    vol = tf.constant(0.2, dtype=tf.float32)
    t = tf.linspace(t0, T, M + 1)

    start = datetime.now()
    with tf.GradientTape() as tape:
        tape.watch([spot, vol])
        S = sim_gbm(N, t, spot, drift, vol, seed)

    J = tape.jacobian(S, [spot, vol])

    stop = datetime.now()
    print(stop - start)

For small N (say less than or equal to 1,000) the two perform very similar both finishing consistenly around 1 second on my computer. However, looking at my task manager I can see that Python uses around 140Mb of memory using the PyTorch implementation, while TensorFlow uses around 340Mb of memory - this is not an issue for me at these levels.

Now, as I increase N to a larger number such as 10,000, then the performance is very different. PyTorch manages to finish around 1 minute and still uses 140Mb of memory. On the other hand TensorFlow runs for couple of minutes and then throws me an error as it has consumed all of my Memory (16Gb).

Why is this happening? I suspect that both frameworks are capable of handling much larger computational graphs than what I have used here when they are working on Neural Networks with severeal millions of parameters (weights and biases).

pointer: look at the jacobian method of both libs; they may compute the way but the tensor/vairable management using memory might be the cause of OOM. — Innat, Aug 22 '23 at 19:46

Automatic Differentiation: PyTorch vs. Tensorflow

0 Answers0