Mapping a Python dict to a Polars series

Question

In Pandas we can use the map function to map a dict to a series to create another series with the mapped values. More generally speaking, I believe it invokes the index operator of the argument, i.e. [].

import pandas as pd

dic = { 1: 'a', 2: 'b', 3: 'c' }

pd.Series([1, 2, 3, 4]).map(dic) # returns ["a", "b", "c", NaN]

I haven't found a way to do so directly in Polars, but have found a few alternatives. Would any of these be the recommended way to do so, or is there a better way?

import polars as pl

dic = { 1: 'a', 2: 'b', 3: 'c' }

# Approach 1 - apply
pl.Series([1, 2, 3, 4]).apply(lambda v: dic.get(v, None)) # returns ["a", "b", "c", null]

# Approach 2 - left join
(
    pl.Series([1, 2, 3, 4])
    .alias('key')
    .to_frame()
    .join(
        pl.DataFrame({
            'key': list(dic.keys()),
            'value': list(dic.values()),
        }),
        on='key', how='left',
    )['value']
) # returns ["a", "b", "c", null]

# Approach 3 - to pandas and back
pl.from_pandas(pl.Series([1, 2, 3, 4]).to_pandas().map(dic)) # returns ["a", "b", "c", null]

I saw this answer on mapping a dict of expressions but since its chains when/then/otherwise it might not work well for huge dicts.

ritchie46 · Accepted Answer · 2023-03-20T10:07:21.450

Update 2023-03-20

Polars has a dedicated map_dict expression. Use this.

Old answer

Mapping a python dictionary over a polars Series should always be considered an anti-pattern. This will be terribly slow and what you want is semantically equal to a join.

Use joins. They are heavily optimized, multithreaded and don't use python.

Example

import polars as pl

dic = { 1: 'a', 2: 'b', 3: 'c' }

mapper = pl.DataFrame({
    "keys": list(dic.keys()),
    "values": list(dic.values())
})

pl.Series([1, 2, 3, 4]).to_frame("keys").join(mapper, on="keys", how="left").to_series(1)

Series: 'values' [str]
[
    "a"
    "b"
    "c"
    null
]

you can construct mapper as `mapper=pl.DataFrame([{'keys':x, 'values':y} for x,y in dic.items()])` for a slight performance boost. — Dean MacGregor, Dec 13 '22 at 14:54

score 6 · Answer 2 · answered Feb 21 '23 at 15:10

Since version 0.16.3 Polars has the Expr.map_dict method and since 0.16.7 the Series.map_dict method which can be used as folows:

import polars as pl

mapping_dict = {1: "a", 2: "b", 3: "c"}

# pl.Series.map_dict
pl.Series([1, 2, 3, 4]).map_dict(mapping_dict)

# pl.Expr.map_dict
pl_df = pl.Series(name="to_map_col", values=[1, 2, 3, 4]).to_frame()

pl_df.with_columns(pl.col("to_map_col").map_dict(mapping_dict))

score -1 · Answer 3 · answered Dec 13 '22 at 15:10

-1

Polars is an awesome tool but even awesome tools aren't meant for everything and this is one of those cases. Using a simple python list comprehension is going to be faster.

You could just do:

[dic[x] if x in dic.keys() else None for x in [1,2,3,4]]

On my computer, the timing of that, using %%timeit is 800ns

In contrast to

pl.Series([1, 2, 3, 4]).to_frame("keys").join(pl.DataFrame([{'keys':x, 'values':y} for x,y in dic.items()]), on="keys", how="left").to_series(1)

which takes 434µs.

Notice that the first is measured in nanoseconds whereas the second is in microseconds so it's really 800ns vs 434000ns.

answered Dec 13 '22 at 15:10

Dean MacGregor

11,847
9
34
72

I don't think this scales though. I image OP having much more rows than give in this MWE. – ritchie46 Dec 14 '22 at 15:33
@ritchie46 yeah I suppose. I just thought it odd that they wanted their output to be a standalone series. – Dean MacGregor Dec 14 '22 at 16:08

Mapping a Python dict to a Polars series

3 Answers3

Update 2023-03-20

Old answer

Example