-1

I see there are at least 6 other similar issues. The main questions here:

  1. Which dataset am I getting this on? I'm working with 4, and the stack trace obscures away any details of what it's looking at.
  2. Why would my dataset be of rank 3 as opposed to 2?

I'm using tensorflow recommenders. The stack trace of the error is too obscure for me to tell what dataset it's complaining about. I'm guessing it's my trained event dataset which looks like this:

@@@ dataset: cached_train_event_ds in create_model - 2
<_OptionsDataset shapes: {item_id: (None,), user_id: (None,)}, types: {item_id: tf.string, user_id: tf.string}>
@@@ record type: <class 'tensorflow.python.data.ops.dataset_ops.TakeDataset'>
@@@ x type: <class 'dict'>
{'item_id': array([b'music:12274071', b'music:12501193', b'music:7864297', ...,
       b'music:11953766', b'music:10805147', b'music:11953766'],
      dtype=object),
 'user_id': array([b'artist:15523352', b'artist:12930551', b'artist:31057444', ...,
       b'artist:32581820', b'artist:36023938', b'artist:30037204'],
      dtype=object)}

The way its shaped are described would lead me to believe it's rank 2, not rank 3.

What are some of the possible reasons for the rank to be counted as 3?

The stack trace is below. It describes the input shapes as [?,0], [?,?,?], []. Might the last [] be the issue? How might it come about?

CODE. A complete reproducible scenario is available here. It has the code, the input data, and a detailed README on how to run the scenario.

Traceback (most recent call last):
  File "/mnt/tmp/spark-23c1419e-4a5c-4ec7-a86f-1f6f23be73d3/recsys_tfrs_songs.py", line 90, in <module>
    main(sys.argv)
  File "/mnt/tmp/spark-23c1419e-4a5c-4ec7-a86f-1f6f23be73d3/recsys_tfrs_songs.py", line 61, in main
    model_maker.train_and_evaluate(model, NUM_TRAIN_EPOCHS)
  File "/mnt/tmp/spark-23c1419e-4a5c-4ec7-a86f-1f6f23be73d3/recsys-deps.zip/recommender_system/recsys_tf/recsys_tfrs_model.py", line 151, in train_and_evaluate
  File "/home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1152, in fit
    tmp_logs = self.train_function(iterator)
  File "/home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 867, in __call__
    result = self._call(*args, **kwds)
  File "/home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 911, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 749, in _initialize
    *args, **kwds))
  File "/home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3045, in _get_concrete_function_internal_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3439, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3284, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 998, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 657, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 985, in wrapper
    raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

    /home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:847 train_function  *
        return step_function(self, iterator)
    /home/hadoop/.local/lib/python3.7/site-packages/tensorflow_recommenders/tasks/retrieval.py:157 call  *
        update_op = self._factorized_metrics.update_state(query_embeddings,
    /home/hadoop/.local/lib/python3.7/site-packages/tensorflow_recommenders/metrics/factorized_top_k.py:83 update_state  *
        top_k_predictions, _ = self._candidates(query_embeddings, k=self._k)
    /home/hadoop/.local/lib/python3.7/site-packages/tensorflow_recommenders/layers/factorized_top_k.py:224 top_k  *
        joined_scores = tf.concat([state_scores, x_scores], axis=1)
    /home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py:206 wrapper  **
        return target(*args, **kwargs)
    /home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py:1768 concat
        return gen_array_ops.concat_v2(values=values, axis=axis, name=name)
    /home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/ops/gen_array_ops.py:1208 concat_v2
        "ConcatV2", values=values, axis=axis, name=name)
    /home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:750 _apply_op_helper
        attrs=attr_protos, op_def=op_def)
    /home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py:600 _create_op_internal
        compute_device)
    /home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:3554 _create_op_internal
        op_def=op_def)
    /home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:2031 __init__
        control_input_ops, op_def)
    /home/hadoop/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1872 _create_c_op
        raise ValueError(str(e))

    ValueError: Shape must be rank 2 but is rank 3 for '{{node concat}} = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32](args_0, args_2, concat/axis)' with input shapes: [?,0], [?,?,?], [].
  • 1
    Code. Where is the code that made these errors? Please see how to create a [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). – o-90 Feb 26 '21 at 16:22
  • 1
    Possible duplicate of: https://stackoverflow.com/questions/42621652/tensorflow-valueerror-shape-must-be-rank-2-but-is-rank-3?rq=1 or https://stackoverflow.com/questions/49100680/shape-must-be-rank-2-but-is-rank-3-for-matmul-46-op-matmul-with-input-sha?rq=1 – rootlikegroot Feb 26 '21 at 16:25
  • @gobrewers I can supply some code but making it reproducible is going to be hard. I'm loading the datasets from a datalake. I can try but the main idea here is to know, from tf folks, what might be the possible causes of the error. You can see the shapes of my dataset. So what might be reasons why it's treated as rank 3 rather than rank 2? is it possible the data is ragged? some other reason? that's what I'm looking for. – Dmitry Goldenberg Feb 26 '21 at 16:56
  • @rootlikegroot Yes, it's possibly a duplicate of the 2nd link. However, all it says is, maybe you need to reshape your dataset. I just want to know, why might I have to? The error is way too cryptic and could definitely use some improvement in usability. The first link is not relevant as I'm not using bidirectional RNN. – Dmitry Goldenberg Feb 26 '21 at 16:58
  • I've added a minimal reproducible example here: https://github.com/dgoldenberg-audiomack/tf_issue_237. Includes the code, the input data, and a detailed README on how to run the example. – Dmitry Goldenberg Feb 27 '21 at 14:37

1 Answers1

2

The problem was that the item dataset got to be batched twice: once when loaded with make_csv_dataset and later again with items_ds.batch(128).map(item_model).

The error message is confusing (to multiple folks, looks like) and folks keep getting tripped up by it.

Firstly, it's hard to tell which dataset the error is being generated for, secondly, the message itself is cryptic. If it at least mentioned the shape dimension that'd give the user an immediate clue as to what's going on, rather than [?,0], [?,?,?], []. (Couldn't set a breakpoint there in the IDE, either). This really should be an enhancement request for that part of TF which produces the exception, to make it clearer and more user-friendly.

  • In my case, when mapping item model, I was adding (mistakenly) an extra dimension. Taking care of that, solved the issue, thanks! – Daniel Mejia May 02 '21 at 16:18