0

In Polars we can select columns by their type -- see e.g. How to select columns by data type in Polars?

However, I cannot figure out how to select datetime columns. From Polars' tutorial we have this example:

import polars as pl
import datetime at dt

df_trades = pl.DataFrame(
    {
        "time": [
            dt.datetime(2020, 1, 1, 9, 1, 0),
            dt.datetime(2020, 1, 1, 9, 1, 0),
            dt.datetime(2020, 1, 1, 9, 3, 0),
            dt.datetime(2020, 1, 1, 9, 6, 0),
        ],
        "stock": ["A", "B", "B", "C"],
        "trade": [101, 299, 301, 500],
    }
)

The column types are printed like this:

┌─────────────────────┬───────┬───────┐
│ time                ┆ stock ┆ trade │
│ ---                 ┆ ---   ┆ ---   │
│ datetime[μs]        ┆ str   ┆ i64   │

For the stock column we can use

df_trades.select(pl.col(pl.Utf8))

But what is the type for the time column? Using dt.datetime instead of pl.Utf8 gives an error:

TypeError: argument 'name': 'type' object cannot be converted to 'PyString'

robertdj
  • 909
  • 1
  • 5
  • 10

2 Answers2

1

You can select polars columns by datatypes as you did with pl.Utf8, but you have to pass a polars datatype to make that work, not a python datetime.datetime value/object.

Below we select all datetime datatypes that have "us" resolution by passing the pl.Datetime("us") datatype.

import polars as pl
from datetime import datetime

df_trades = pl.DataFrame(
    {
        "time1": [
            datetime(2020, 1, 1, 9, 1, 0),
            datetime(2020, 1, 1, 9, 1, 0),
            datetime(2020, 1, 1, 9, 3, 0),
            datetime(2020, 1, 1, 9, 6, 0),
        ],
        "time2": [
            datetime(2020, 1, 1, 9, 1, 0),
            datetime(2020, 1, 1, 9, 1, 0),
            datetime(2020, 1, 1, 9, 3, 0),
            datetime(2020, 1, 1, 9, 6, 0),
        ],
        "stock": ["A", "B", "B", "C"],
        "trade": [101, 299, 301, 500],
    }
)

df_trades.select(
    pl.col(pl.Datetime("us"))
)
shape: (4, 2)
┌─────────────────────┬─────────────────────┐
│ time1               ┆ time2               │
│ ---                 ┆ ---                 │
│ datetime[μs]        ┆ datetime[μs]        │
╞═════════════════════╪═════════════════════╡
│ 2020-01-01 09:01:00 ┆ 2020-01-01 09:01:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2020-01-01 09:01:00 ┆ 2020-01-01 09:01:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2020-01-01 09:03:00 ┆ 2020-01-01 09:03:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2020-01-01 09:06:00 ┆ 2020-01-01 09:06:00 │
└─────────────────────┴─────────────────────┘

ritchie46
  • 10,405
  • 1
  • 24
  • 43
1

Starting from Polars 0.18.1 you can use polars.selectors.datetime selector to select only datetime columns.

>>> import polars as pl
>>> import polars.selectors as cs
>>> import datetime as dt
>>> 
>>> df_trades = pl.DataFrame(
...     {
...         "time": [
...             dt.datetime(2020, 1, 1, 9, 1, 0),
...             dt.datetime(2020, 1, 1, 9, 1, 0),
...             dt.datetime(2020, 1, 1, 9, 3, 0),
...             dt.datetime(2020, 1, 1, 9, 6, 0),
...         ],
...         "stock": ["A", "B", "B", "C"],
...         "trade": [101, 299, 301, 500],
...     }
... )
>>> 
>>> print(df_trades.select(cs.datetime()))
shape: (4, 1)
┌─────────────────────┐
│ time                │
│ ---                 │
│ datetime[μs]        │
╞═════════════════════╡
│ 2020-01-01 09:01:00 │
│ 2020-01-01 09:01:00 │
│ 2020-01-01 09:03:00 │
│ 2020-01-01 09:06:00 │
└─────────────────────┘
Abdul Niyas P M
  • 18,035
  • 2
  • 25
  • 46