1

I have a large arrow file with 14 million rows. In my app I select two columns and bin them using the count/binby functionality in Vaex.

df.count(
  binby=axes,
  limits=limits,
  shape=(binnum,)*len(axes),
  delay=True
)

Some of my columns act as a mask and have either a 0 or 1. Here's an example

#   x    y    mask
1   1.5  4.7  0
2   0.3  2.3  1
3   2.6  9.4  1
4   5.0  3.7  0

I wish to bin the points in the x and y axes that only have a 1 in the mask column. How do I do this?

afriedman111
  • 1,925
  • 4
  • 25
  • 42
  • Would something like `df.count(..., selection="mask==1")` work? – Joco Oct 12 '21 at 14:44
  • no, this is a bug that exists and is documented [here](https://github.com/vaexio/vaex/issues/1592) – afriedman111 Oct 21 '21 at 20:10
  • Ah sure, but in that issue there are more details than in the OP post here, related to the geo methods I suppose. Without the geo stuff, your example as state above should work i think. – Joco Oct 28 '21 at 10:43

1 Answers1

0

Assuming that the mask is in the same dataframe, using selection kwarg should work:

import vaex

df = vaex.example()

df.count(binby=['x', 'y'], shape=(10,10), selection="id==0")
SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46