Given a Polars DataFrame, is there a way to replace a particular value by "null"? For example, if there's a sentinel value like "_UNKNOWN"
and I want to make it truly missing in the dataframe instead.
Asked
Active
Viewed 2,388 times
0

astrojuanlu
- 6,744
- 8
- 45
- 105
-
Perhaps it was the lack of a before/after dataframe example. – jqurious Dec 15 '22 at 16:17
3 Answers
5
You can use .when().then().otherwise()
pl.col(pl.Utf8)
is used to select all "string columns".
df = pl.DataFrame({
"A": ["a", "b", "_UNKNOWN", "c"],
"B": ["_UNKNOWN", "d", "e", "f"],
"C": [1, 2, 3, 4]
})
df.with_columns(
pl.when(pl.col(pl.Utf8) == "_UNKNOWN")
.then(None)
.otherwise(pl.col(pl.Utf8)) # keep original value
.keep_name()
)
shape: (4, 3)
┌──────┬──────┬─────┐
│ A | B | C │
│ --- | --- | --- │
│ str | str | i64 │
╞══════╪══════╪═════╡
│ a | null | 1 │
├──────┼──────┼─────┤
│ b | d | 2 │
├──────┼──────┼─────┤
│ null | e | 3 │
├──────┼──────┼─────┤
│ c | f | 4 │
└─//───┴─//───┴─//──┘

jqurious
- 9,953
- 1
- 4
- 14
-
Works, thanks! Would be great to include @Dean's tweak in this answer. – astrojuanlu Dec 19 '22 at 17:23
3
This is really a tweak of @jqurious's answer.
When you do a when
and your condition isn't met then the default is null
so you can just do:
df.with_columns(
pl.when(pl.col(pl.Utf8) != "_UNKNOWN")
.then(pl.col(pl.Utf8)) # keep original value
.keep_name()
)
If you have multiple null conditions say null_strings=["_UNKNOWN", "UNK", "who_knows"]
then you can use a is_in
like this:
df.with_columns(
pl.when(~pl.col(pl.Utf8).is_in(null_strings))
.then(pl.col(pl.Utf8)) # keep original value
.keep_name()
)

Dean MacGregor
- 11,847
- 9
- 34
- 72
1
In addition to previous answers, if you have just imported dataset, you can specify what values will be null
Example data.csv
:
A,B
a,1
b,2
_UNKNOWN,more than 3
c,_UNKNOWN
when importing dataset:
pl.read_csv("data.csv",
null_values="_UNKNOWN",
# infer_schema_length=0, # optional (if you will get parsing error))
df
:
┌──────┬─────────────┐
│ A ┆ B │
│ --- ┆ --- │
│ str ┆ str │
╞══════╪═════════════╡
│ a ┆ 1 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ b ┆ 2 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ null ┆ more than 3 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ c ┆ null │
└──────┴─────────────┘

glebcom
- 1,131
- 5
- 14