-1

I need to jit the train step but when I do I get this error

import jax_resnet
import jax
import jax.numpy as jnp
from flax import linen as nn
import tensorflow_datasets as tfds
from flax.training import train_state
import optax
import numpy as np
from functools import partial
from flax.core.frozen_dict import unfreeze

def get_data():
    ds_builder = tfds.builder('cifar10')
    ds_builder.download_and_prepare()
    train_ds = tfds.as_numpy(ds_builder.as_dataset(split='train', batch_size=-1))
    test_ds = tfds.as_numpy(ds_builder.as_dataset(split='test', batch_size=-1))
    train_ds['image'] = jnp.float32(train_ds['image']) / 255.
    test_ds['image'] = jnp.float32(test_ds['image']) / 255.
    return train_ds, test_ds

class CNN(nn.Module):
  """A simple CNN model."""

  @nn.compact
  def __call__(self, x):
    x = nn.Conv(features=32, kernel_size=(3, 3))(x)
    x = nn.relu(x)
    x = nn.avg_pool(x, window_shape=(2, 2), strides=(2, 2))
    x = nn.Conv(features=64, kernel_size=(3, 3))(x)
    x = nn.relu(x)
    x = nn.avg_pool(x, window_shape=(2, 2), strides=(2, 2))
    x = x.reshape((x.shape[0], -1))  # flatten
    x = nn.Dense(features=256)(x)
    x = nn.relu(x)
    x = nn.Dense(features=10)(x)
    return x

def get_model(no_params=False):
    model = CNN()#jax_resnet.ResNet50(n_classes=10)
    if no_params:
        return model
    else:
        key = jax.random.PRNGKey(0)
        params = model.init(key, jnp.ones((1,32,32,3)))
        return params, model

def get_loss(*, logits, labels):
    labels_one_hot = jax.nn.one_hot(labels, num_classes=10)
    return optax.softmax_cross_entropy(logits=logits, labels=labels_one_hot).mean()

def get_opt(params):
    opt = optax.sgd(learning_rate=0.001)
    opt_state = opt.init(params)
    return opt, opt_state

def compute_metrics(*, logits, labels):
    loss = get_loss(logits=logits, labels=labels)
    accuracy = jnp.mean(jnp.argmax(logits, -1) == labels)
    metrics = {'loss': loss, 'accuracy': accuracy,}
    return metrics


def gradient_accum(grads, temp_grads):
    flat_grads = jax.tree_util.tree_flatten(grads)
    flat_temp_grads = jax.tree_flatten(temp_grads)
    pre_grads = jax.tree_map(lambda x,y: x+y, temp_grads[0], flat_grads[0])
    #print(pre_grads)
    #grads = jax.tree_util.tree_map(lambda x: jax.lax.select(jax.lax.gt(x, jnp.float32(0.00001)), x, jnp.float32(0.0)), pre_grads)
    grads =  [jnp.where(jax.lax.gt(x, jnp.float32(0.00001)), x, jnp.float32(0.0)) for x in pre_grads]
    grads = jax.tree_util.tree_unflatten(flat_grads[1], grads)
    flat_grads = jax.tree_util.tree_flatten(grads)
    temp_grads = jax.tree_unflatten(flat_temp_grads[1],jax.tree_map(lambda x,y:x-y, pre_grads, flat_grads[0]))
    return grads, temp_grads

#@partial(jax.jit, static_argnums = (2,))
@jax.jit
def train_step(params, opt_state, temp_grads, batch):
    @jax.jit
    def forward(params):
        resnet = get_model(no_params=True)
        logits = resnet.apply(params, batch['image'])
        loss = get_loss(logits=logits, labels=batch["label"])
        return loss, logits
    
    grad_fn = jax.value_and_grad(forward, has_aux=True)
    (_, logits), grads = grad_fn(params)
    grads, temp_grads = gradient_accum(grads, temp_grads)

    updates, opt_state = optimizer.update(grads, opt_state, params)
    params = optax.apply_updates(params, updates)
    
    metrics = compute_metrics(logits=logits, labels=batch['label'])
    return params, opt_state, temp_grads, metrics


def train_epoch(params, opt_state, train_ds, temp_grads, batch_size, epoch, rng):
    train_ds_size = len(train_ds['image'])
    steps_per_epoch = train_ds_size // batch_size
    permed_data = jax.random.permutation(rng, train_ds_size)
    permed_data = permed_data[:steps_per_epoch * batch_size]
    permed_data = permed_data.reshape((steps_per_epoch, batch_size))

    batch_metrics = []

    for batch in permed_data:
        batch = {k: v[batch, ...] for k, v in train_ds.items()}
        #print(jax.make_jaxpr(train_step)(state,batch,temp_grads))
        params, opt_state, temp_grads, metrics = train_step(params, opt_state, temp_grads, batch)
        batch_metrics.append(metrics)
    
    batch_metrics_np = jax.device_get(batch_metrics)
    
    return params, opt_state, temp_grads


TypeError                                 Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/jax/_src/api.py in _valid_jaxtype(arg)
   2918   try:
-> 2919     xla.abstractify(arg)  # faster than core.get_aval
   2920   except TypeError:

20 frames
TypeError: Value '[b'train_19009' b'train_31365' b'train_05158' b'train_31760'
 b'train_21509' b'train_22978' b'train_15361' b'train_19925'
 b'train_03894' b'train_18952' b'train_45240' b'train_07968'
 b'train_21667' b'train_08037' b'train_07961' b'train_45250'
 b'train_26699' b'train_27887' b'train_41832' b'train_14143'
 b'train_49745' b'train_21843' b'train_18343' b'train_34463'
 b'train_17154' b'train_06764' b'train_46962' b'train_39989'
 b'train_17994' b'train_30312' b'train_25505' b'train_26194']' with dtype object is not a valid JAX array type. Only arrays of numeric types are supported by JAX.

During handling of the above exception, another exception occurred:

UnfilteredStackTrace                      Traceback (most recent call last)
UnfilteredStackTrace: AssertionError: [b'train_19009' b'train_31365' b'train_05158' b'train_31760'
 b'train_21509' b'train_22978' b'train_15361' b'train_19925'
 b'train_03894' b'train_18952' b'train_45240' b'train_07968'
 b'train_21667' b'train_08037' b'train_07961' b'train_45250'
 b'train_26699' b'train_27887' b'train_41832' b'train_14143'
 b'train_49745' b'train_21843' b'train_18343' b'train_34463'
 b'train_17154' b'train_06764' b'train_46962' b'train_39989'
 b'train_17994' b'train_30312' b'train_25505' b'train_26194']

The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.

--------------------

The above exception was the direct cause of the following exception:

AssertionError                            Traceback (most recent call last)
<ipython-input-11-9f1f832feba7> in train_epoch(params, opt_state, train_ds, temp_grads, batch_size, epoch, rng)
     92         batch = {k: v[batch, ...] for k, v in train_ds.items()}
     93         #print(jax.make_jaxpr(train_step)(state,batch,temp_grads))
---> 94         params, opt_state, temp_grads, metrics = train_step(params, opt_state, temp_grads, batch)
     95         batch_metrics.append(metrics)
     96 

AssertionError: [b'train_19009' b'train_31365' b'train_05158' b'train_31760'
 b'train_21509' b'train_22978' b'train_15361' b'train_19925'
 b'train_03894' b'train_18952' b'train_45240' b'train_07968'
 b'train_21667' b'train_08037' b'train_07961' b'train_45250'
 b'train_26699' b'train_27887' b'train_41832' b'train_14143'
 b'train_49745' b'train_21843' b'train_18343' b'train_34463'
 b'train_17154' b'train_06764' b'train_46962' b'train_39989'
 b'train_17994' b'train_30312' b'train_25505' b'train_26194']

I have no idea what is wrong and can't find anything related to what is happening. It has something to do with temp grads. temp grades are supposed to be the same shape as grads but all 0.0 at the start.

(The temp_grads are for a temporary implementation of gradient accumulation a process in which you store grads locally which are insignificant as their magnitude is too small and sum them with grads from the next steps until they meet a specific value. it is used to save bandwidth when communicating between devices )

desertnaut
  • 57,590
  • 26
  • 140
  • 166

1 Answers1

1

The message is a bit obscured, but it is there:

TypeError: Value ... with dtype object is not a valid JAX array type. Only arrays of numeric types are supported by JAX.

JAX does not support string arrays, and it appears you're passing a string array to a JAX function. You'll have to find a different approach to use.

jakevdp
  • 77,104
  • 11
  • 125
  • 160
  • The problem is none of the values I pass in are string arrays – Christopher Rae Sep 27 '22 at 08:23
  • In that case, as [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) would help. Your question defines some classes and functions, but doesn't show how they are called, so all I can do is guess what code you're executing based on the the traceback. – jakevdp Sep 27 '22 at 12:06
  • But your statement is incorrect; I ran `get_data()`, and it returns a dictionary with an `'id'` entry that is a string array. You won't be able to use this data with JAX. – jakevdp Sep 27 '22 at 12:09
  • @jakevdp What should be recommended course of action here? Will converting id to int do? Or the only way is to drop the id column altogether? – user541396 Oct 23 '22 at 17:49
  • You could either convert incompatible columns to compatible ones, or drop them entirely. – jakevdp Oct 24 '22 at 12:53