2

I am extremely happy with the polars expression syntax, so much so that a lot of my feature engineering is expressed in polars expressions.

However, I am now trying to move the feature engineering to JSON or YAML files (for MLOps reasons).

The question is - how could I encode this as a JSON file:


configuration = {
     'features': [
          pl.col('col1').fill_null(0).log().le(0.2).alias('feature1'),
          pl.col('col2').fill_null(0).log().le(0.2).alias('feature2'),
          pl.col('col3').fill_null(0).log().le(0.2).alias('feature3')
                ],
     'filters': [
          pl.col('col4') >= 500_000, 
          pl.col('col5').is_in(['A', 'B'])
      ]
}

# This is how I use it - just for context
X = (df
         .filter(pl.all(configuration['filters']))
         .select(configuration['features'])
       )

Any ideas on how I could serialize (or re-write) this as JSON such that it could be converted back to Polars expressions?

Note that this question has a lot of overlap with Possible to Stringize a Polars Expression?, but it's not a duplicate.

MYK
  • 1,988
  • 7
  • 30

1 Answers1

4

As of polars >= 0.18.1 we directly support serializing/deserializing expressions to and from json.

def test_expression_json() -> None:
    # create an expression
    e = pl.col("foo").sum().over("bar")
    
    # serialize to json
    json = e.meta.write_json()

    # deserialize back to an expression
    round_tripped = pl.Expr.from_json(json)

    # assert expression equality
    assert round_tripped.meta == e
ritchie46
  • 10,405
  • 1
  • 24
  • 43