Is boolean indexing in PySpark officially supported?

Asked Aug 06 '20 at 00:28

Active Aug 06 '20 at 00:28

Viewed 470 times

Today I discovered you can filter a PySpark DataFrame via boolean indexing:

In [3]: df.show()
+-----+---+
|name1|  v|
+-----+---+
| john|1.0|
|  sam|4.0|
|  meh|3.0|
+-----+---+

In [6]: df[df['v']>2.0].show()
+-----+---+
|name1|  v|
+-----+---+
|  sam|4.0|
|  meh|3.0|
+-----+---+

A common way to do this is to use PySpark's filter function, e.g. Spark - SELECT WHERE or filtering?. But is the above syntax documented and officially supported? I like this syntax because it's consistent with that in Pandas (where the filter function means something else entirely).

asked Aug 06 '20 at 00:28

flow2k

3,999
40
55

1

It is supported because we're working towards a pandas-like syntax, but I do not think it is officially documented. – pissall Aug 06 '20 at 05:22

Is boolean indexing in PySpark officially supported?

0 Answers0