0

How can I use Ibis to fill missing values with the mean?

For example, if I have this data:

import pandas as pd
import ibis
from ibis import _

ibis.options.interactive = True

df = pd.DataFrame(data={'fruit': ['apple', 'apple', 'apple', 'orange', 'orange', 'orange'],
                        'variety': ['gala', 'honeycrisp', 'fuji', 'navel', 'valencia', 'cara cara'],
                        'weight': [134 , 158, pd.NA, 142, 96, pd.NA]})

t = ibis.memtable(df)

Using Ibis code:

  • How would I replace the NA values in the weight column with the overall mean of weight?
  • How would I replace the NA values in the weight column with the the mean within each group (apples, oranges)?
ianmcook
  • 537
  • 4
  • 10

1 Answers1

1

In the first case (replacing NULL with overall mean) you can simply pass the mean of the replacement column to fillna and ibis will figure out what you mean:

In [27]: t.mutate(weight=_.weight.fillna(_.weight.mean()))
Out[27]:
┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┓
┃ fruit  ┃ variety    ┃ weight  ┃
┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━┩
│ string │ string     │ float64 │
├────────┼────────────┼─────────┤
│ apple  │ gala       │   134.0 │
│ apple  │ honeycrisp │   158.0 │
│ apple  │ fuji       │   132.5 │
│ orange │ navel      │   142.0 │
│ orange │ valencia   │    96.0 │
│ orange │ cara cara  │   132.5 │
└────────┴────────────┴─────────┘

In the second case of replacing the nulls per group, you can use a window function:

In [28]: t.mutate(weight=_.weight.fillna(_.weight.mean().over(group_by="fruit")))
Out[28]:
┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┓
┃ fruit  ┃ variety    ┃ weight  ┃
┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━┩
│ string │ string     │ float64 │
├────────┼────────────┼─────────┤
│ orange │ navel      │   142.0 │
│ orange │ valencia   │    96.0 │
│ orange │ cara cara  │   119.0 │
│ apple  │ gala       │   134.0 │
│ apple  │ honeycrisp │   158.0 │
│ apple  │ fuji       │   146.0 │
└────────┴────────────┴─────────┘
Phillip Cloud
  • 24,919
  • 11
  • 68
  • 88