0

I want to strip a dataframe based on its data type per column. If it is a string column, a strip should be executed. If it is not a string column, it should not be striped. In pandas there is the following approach for this task:

df_clean = df_select.copy()
    for col in df_select.columns:
        if df_select[col].dtype == 'object':
            df_clean[col] = df_select[col].str.strip()

How can this be executed in polars?

import polars as pl
    
df = pl.DataFrame(
        {
            "ID": [1, 1, 1, 1,],
            "A": ["foo ", "ham", "spam ", "egg",],
            "L": ["A54", " A12", "B84", " C12"],
        }
)
Horseman
  • 297
  • 1
  • 14
  • `pl.col(pl.Utf8).str.strip()` - https://stackoverflow.com/questions/72359181/how-to-select-columns-by-data-type-in-polars – jqurious Feb 09 '23 at 09:51
  • the strip is not the problem. What if I have an other datatyope than string? I need to have an exception like in the example above wit 'object' – Horseman Feb 09 '23 at 09:57
  • 3
    `pl.col(pl.Utf8)` selects only string columns – jqurious Feb 09 '23 at 09:58
  • but when I select only the string columns I need to append them again to the orignial df because I also need the none string columns. – Horseman Feb 09 '23 at 10:04

1 Answers1

3

You don't need a copy, you can directly use with_columns on df_select:

import polars as pl
    
df_select = pl.DataFrame(
        {
            "ID": [1, 1, 1, 1,],
            "A": ["foo ", "ham", "spam ", "egg",],
            "L": ["A54", " A12", "B84", " C12"],
        }
)

df_clean = df_select.with_columns(pl.col(pl.Utf8).str.strip())

Output:

shape: (4, 3)
┌─────┬──────┬─────┐
│ ID  ┆ A    ┆ L   │
│ --- ┆ ---  ┆ --- │
│ i64 ┆ str  ┆ str │
╞═════╪══════╪═════╡
│ 1   ┆ foo  ┆ A54 │
│ 1   ┆ ham  ┆ A12 │
│ 1   ┆ spam ┆ B84 │
│ 1   ┆ egg  ┆ C12 │
└─────┴──────┴─────┘
Tranbi
  • 11,407
  • 6
  • 16
  • 33