1

So aim is I'm trying to save some arrays as parquets. I can use a python debugger to reach the point in my code that they are ready for saving. Inside my complicated mess of code they look like;

ipdb> ak.__version__
'1.2.2'
ipdb> array1
<Array [... 0., 1.]], [[50.], [4., 47.]]]] type='1 * 3 * var * var * float64'>
ipdb> array2
<Array [[False, True, True]] type='1 * 3 * bool'>

If I try to save them it doesn't work, the error I get is

ipdb> group = ak.zip({'a': array1, 'b': array2}, depth_limit=1)
ipdb> ak.to_parquet(group, 'test.parquet')
*** ValueError: could not broadcast input array from shape (3) into shape (1)

So I start messing around in the terminal to try and recreate the problem and debug it, but I actually cannot replicate it. Here is what happens;

In [1]: import awkward as ak
In [2]: ak.__version__
'1.2.2'
In [3]: cat = ak.from_iter([[True, False, True]])
In [4]: dog = ak.from_iter([[[], [[50.0], [0.2, 0.1, 0., 0., 0.1]], [[50.0], [21., 0.1, 0.]]]])
In [5]: pets = ak.zip({'dog':dog, 'cat':cat}, depth_limit=1)
In [6]: ak.to_parquet(pets, "test.parquet")
In [7]: # no problems
In [8]: cat
<Array [[False, True, True]] type='1 * var * bool'>

Notice that the dimensions have changed from 1 * 3 * bool to 1 * var * bool. That seems to be the only difference - but I cant seem to work out how to control this?


Having managed to isolate the issue, it wasn't what I thought it was. The problem comes when using np.newaxis to make a new axis in a boolean array, then trying to save it.


dog = ak.from_iter([1, 2, 3])[np.newaxis]
pets = {"dog": dog}
zipped = ak.zip(pets, depth_limit=1)
ak.to_parquet(zipped, "test.parquet")
# works fine

dog = ak.from_iter([True, False, True])[np.newaxis]
pets = {"dog": dog}
zipped = ak.zip(pets, depth_limit=1)
ak.to_parquet(zipped, "test.parquet")

# Gives 
ValueError: could not broadcast input array from shape (3) into shape (1)

I should really know better than to post a question without isolating the problem first. Apologies for wasting your time.

Clumsy cat
  • 289
  • 1
  • 12
  • I know I should reduce my code down to a minimal broken example, please forgive my laziness. If there isn't an obvious answer to this, I will. – Clumsy cat May 07 '21 at 14:33

1 Answers1

1

In ak.zip, depth_limit=1 means that the arrays are not deeply matched ("zipped") together: the only constraint is that len(array1) == len(array2). Is this not satisfied?

In your pets example, len(cat) == 1 and len(dog) == 1. Since you're asking for depth_limit=1, it doesn't matter that len(cat[0]) == len(dog[0]), though in this case it does (they're both 3). Thus, it would be possible to zip these at depth_limit=2, even though that's not what you're asking for.

Since the error message is saying that the mismatching lengths of array1 and array2 are 3 and 1, that should be easy to inspect in the debugger:

array1[0]
array1[1]
array1[2]   # should be the last one

array2[0]   # should be the only one

I hope this sheds some light on your problem!


Looking more closely, I see that you're telling me that you know the lengths of array1 and array2. They're both length 1. There should be no trouble zipping them at depth_limit=1.

You can make your pets example have exactly the right types by calling ak.to_regular on that axis:

>>> cat = ak.to_regular(ak.from_iter([[True, False, True]]), axis=1)
>>> dog = ak.to_regular(ak.from_iter([[[], [[50.0], [0.2, 0.1, 0., 0., 0.1]], [[50.0], [21., 0.1, 0.]]]]), axis=1)
>>> cat
<Array [[True, False, True]] type='1 * 3 * bool'>
>>> dog
<Array [... 0, 0.1]], [[50], [21, 0.1, 0]]]] type='1 * 3 * var * var * float64'>

So the types are exactly 1 * 3 * bool and 1 * 3 * var * var * float64. Zipping works:

>>> pets = ak.zip({'dog':dog, 'cat':cat}, depth_limit=1)
>>> pets
<Array [... 0]]], cat: [True, False, True]}] type='1 * {"dog": var * var * var *...'>
>>> pets.type
1 * {"dog": var * var * var * float64, "cat": var * bool}

Maybe the array1 and array2 you think you're working with are not what you're really working with?

Jim Pivarski
  • 5,568
  • 2
  • 35
  • 47
  • Thanks for pointing out the `to_regular` function. With that, the example I have in the terminal does look identical to the example in my code, and it saves just fine. As you say it's entirely possible that I'm not working with the arrays I think I am. I will reduce the code down until I discover my error or create a reproducible example of the issue. – Clumsy cat May 08 '21 at 08:07
  • I'm sorry, I messed up. I updated the question with the actual problem, which I think is a bug? By the way, it can be solved by calling `to_regular` on the boolean array (makes it save fine). – Clumsy cat May 08 '21 at 12:11
  • 1
    I ran your new example, but it doesn't raise any exceptions for me. `np.newaxis` just makes a regular axis of length 1, which the previous example had as well. (I also verified that you have the latest version.) – Jim Pivarski May 09 '21 at 11:52
  • Ah ok. Lets assume that I have somehow messed up my system for now. If I can recreate the issue in a docker I will post it as a github issue. – Clumsy cat May 10 '21 at 08:50
  • Followed up here: https://github.com/scikit-hep/awkward-1.0/issues/859 It was a bug, but an old one that had been fixed in a pre-release, though not a mainline release. – Jim Pivarski May 10 '21 at 14:34