3

The example in a section about 'list context' in the polars-book uses pl.col("") expression with an empty string "" as the argument.

# the percentage rank expression
rank_pct = pl.col("").rank(reverse=True) / pl.col("").count()

From the context and the output I can guess what pl.col("") expression does. But the API documentation does not seem to cover a case of empty string as the argument to pl.col and I would like to know the precise meaning in this use case. Any helpful answer is greatly appreciated!

1 Answers1

5

The precise meaning is to act as a 'root' Expression to start a chain of Expressions inside a List context, i.e., inside arr.eval(....). I'll need to take a step back to explain...

'Root' Expressions

In general, only certain types of Expressions are allowed to start (or be the 'root' of) an Expression. These 'root' Expressions work with a particular context (select, filter,with_column, etc..) to identify what data is being addressed.

Some examples of root Expressions are polars.col, polars.apply, polars.map, polars.first, polars.last, polars.all, and polars.any. (There are others.)

Once we declare a "root" Expression, we can then chain other, more-generic Expressions to perform work. For example, polars.col("my_col").sum().over('other_col').alias('name').

The List context

A List context is slightly different from most contexts. In a List context, there is no ambiguity as to what data is being addressed. There is only a list of data. As such, polars.col and polars.first were chosen as "root" Expressions to use within a List context.

Normally, a polars.col root Expression contains information such as a string to denote a column name or a wildcard expression to denote multiple columns. However, this is not needed in a List context. There is only one option - the single list itself.

As such, any string provided to polars.col is ignored in a List context. For example, from the code from the Polars Guide, this code also works:

# Notice that I'm referring to columns that do not exist...
rank_pct = pl.col("foo").rank(reverse=True) / pl.col("bar").count()

Since any string provided to a polars.col Expression will be ignored in a List context, a single empty string "" is often supplied, just to prevent unnecessary clutter.

Edit: New polars.element expression

Polars now has a polars.element expression designed for use in list evaluation contexts. Using polars.element is now considered idiomatic for list contexts, as it avoids confusion associated with using col(“”).

  • `@cbilot` : Fantastic answer, thanks a lot! Adding it to the polars-book would help new users to wrap their heads around the concepts behind polars expressions. – John Steinbeck May 05 '22 at 12:51