why the shape of image tensor is (?, ?, ?)

Question

Here is my code,

img_path = tf.read_file(testqueue[0])
my_img = tf.image.decode_jpeg(img_path)
sess.run(my_img)
print my_img.get_shape()

Which results in,

(?, ?, ?)

Why did I get this result?

Try `sess.run(my_img).shape` or the equivalent. I believe the `my_img` is just a symbol until you run it with `sess.run`. The output should have a shape. — , Jan 31 '18 at 23:01
Thanks! That's right. And how to create tensor to represent the shape of my_img before sess.run? — Zhenxi Li, Jan 31 '18 at 23:38
Why do you want the shape before `sess.run`? Usually, you just leave it as a symbol so the rest of your algorithm can automatically adjust itself to the new shape of future images. — , Jan 31 '18 at 23:40
Yes, tf.shape() works. Thank you guys! I want the shape is because I want to build the graph before I run the session and there would be some operations based on the shape value. — Zhenxi Li, Jan 31 '18 at 23:47

Patwie · Answer 1 · 2018-02-01T10:18:47.117

To answer this question and provide some details.

static information

tensor_name.shape returns the shape information available at graph compilation time. It relies on the tensor-properties.

tf.decode_jpeg is registered here. During creating the graph, TensorFlow runs a shape propagation under the InferenceContext. Given the shape-properties known from the input tensors, each operation provides hints how its output tensors will look like.

For example, the "rgb2gray" operation would just copy the shape of the input tensor (say [b',h',w',c'] and set the output to [b',h',w',1]. It does not need to know the exact values for b', h', w', as it can just copy these previous values.

Looking at the specific implementation for tf.decode_jpeg, this operation clearly can handle a channels attribute:

// read the attribute "channels from "tf.image.decode_jpeg(..., channels)"
TF_RETURN_IF_ERROR(c->GetAttr("channels", &channels));
// ....
// set the tensor information "my_img.get_shape()" will have
c->set_output(0, c->MakeShape({InferenceContext::kUnknownDim,
                                 InferenceContext::kUnknownDim, channels_dim}));

The first two dimension are set to InferenceContext::kUnknownDim as the operation only knows there is a height and width, but the specific values can be varying. It makes a best guess how the channel axis looks like. If you specify the attribute tf.decode_jpeg(..., channels=3) it can and will set the last

This results to a shape (?, ?, ?), as the if-branch channels ==0 gets active here.

run-time information

On the other hand, tf.shape defined here ends up here. This inspects the actual tensor-content here:

// get actual tensor-shape from the value itself
TensorShape shape;
OP_REQUIRES_OK(ctx, shape_op_helpers::GetRegularOrVariantShape(ctx, 0, &shape));
const int rank = shape.dims();
// write the tensor result from "tf.shape(...)"
for (int i = 0; i < rank; ++i) {
  int64 dim_size = shape.dim_size(i);
  // ...
  vec(i) = static_cast<OutType>(dim_size); // the actual size for dim "i"
}

It is like tf.shape is saying to its previous operation:

You can tell me whatever conclusion you came up with some minutes ago. I do not care how you clever you were at this point or how much work you have put into your guess about the shape. See, I just look at the concrete tensor which now has a content and I am done.

consequences

This has some important consequences:

tf.shape is a tensor, while tensorname.shape is not
some attributes require an integer. Hence there is no way of using the tensor tf.shape
Graph-Optimization (like XLA) can only rely on information given in tensorname.shape
If you know the shape of the image (having a database of only 128x128x3 images), you should set the shape, e.g., using tf.reshape(img, [128, 128, 3]

You might be interested as well in tf.image.extract_jpeg_shape which is implemented here.

why the shape of image tensor is (?, ?, ?)

1 Answers1

static information

run-time information

consequences

Linked