2

Given this dataframe:

df = pl.DataFrame({"s": ["pear", None, "papaya", "dragonfruit"]})

I want to remove the last X chars, e.g. remove the last 2 chars from the column. This obviously doesn't do what I want:

df.with_columns(   
    pl.col("s").str.slice(2).alias("s_sliced"),

)

I'd like the result to be:

shape: (4, 2)
┌─────────────┬──────────┐
│ s           ┆ s_sliced │
│ ---         ┆ ---      │
│ str         ┆ str      │
╞═════════════╪══════════╡
│ pear        ┆ pe       │
│ null        ┆ null     │
│ papaya      ┆ papa     │
│ dragonfruit ┆ dragonfru|  
TylerH
  • 20,799
  • 66
  • 75
  • 101
nos
  • 223,662
  • 58
  • 417
  • 506

2 Answers2

1

You could use a regex with .str.replace

  • . matches a "single character"
  • $ matches the "end"
  • {N} matches exactly N times

Meaning we could use ..$ or .{2}$

df.with_columns(   
    pl.col("s").str.replace(r"..$", "").alias("s_sliced"),
)
shape: (4, 2)
┌─────────────┬───────────┐
│ s           ┆ s_sliced  │
│ ---         ┆ ---       │
│ str         ┆ str       │
╞═════════════╪═══════════╡
│ pear        ┆ pe        │
│ null        ┆ null      │
│ papaya      ┆ papa      │
│ dragonfruit ┆ dragonfru │
└─────────────┴───────────┘
jqurious
  • 9,953
  • 1
  • 4
  • 14
0

A non-regex way to do this is pretty ugly but here it is...

df.with_columns(
    s_sliced=(pl.col('s').str.explode().implode().over('s')
                .list.take(pl.arange(0,(pl.col('s').str.n_chars()-2))))
            .list.eval(pl.element().str.concat("")).list.get(0))

The first thing to notice is that we're converting the character column into a list using str.explode.implode, this is because str.slice doesn't take an expression for the length but pl.arange does. Then we just use take the elements we want which is all over them except the last 2 Lastly, we have to convert our list back into a string.

** There was a really concise answer using a simply regex but its author deleted it and I'm reluctant to copy it. Maybe the post's author will undelete it.

Dean MacGregor
  • 11,847
  • 9
  • 34
  • 72
  • Thanks Dean - after a rethink, I wasn't sure if suggesting regex was the correct approach. – jqurious Jun 16 '23 at 14:52
  • 1
    @jqurious yeah in base python, as I'm sure you know, you could just do `s[0:-2]` but the polars slice isn't as flexible so regex is certainly the easiest to read. There was [this](https://github.com/pola-rs/polars/issues/7127) issue a while back which is similar but for lists – Dean MacGregor Jun 16 '23 at 15:13
  • Thanks for the issue link. Yeah, I thought perhaps a `.str.head` / `.str.tail` might exist to mirror the other head/tail functions. – jqurious Jun 16 '23 at 15:30