How to resample value in pandas column?

Question

I know about the resample function on time-series data. I want something similar on a normal column with 3000 examples. I want to keep the length. I want every row to have the value of the last occurrence in a n- long window.

I know about group by as well and the last function, but here I am groupping based on length not on some value.

I want non-overlapping windows, so rolling does not help either.

Example of window of size three:

0         sakshijoshii
1         medpagetoday
2            nickmmark
3      mukeshm07384110
4         DipakBiswas_       
5      jaysanchezdorta
6         Terry6969696
7            LizShelby
8            wlharper1
9       BruhOriginalMe

What I want:

0            nickmmark
1            nickmmark
2            nickmmark
3      jaysanchezdorta
4      jaysanchezdorta      
5      jaysanchezdorta
6            wlharper1
7            wlharper1
8            wlharper1
9       BruhOriginalMe

Oof, I remember having to do something similar long ago and couldn't find an efficient solution that didn't use an iteration. Is the length of your window a fixed value or does it change? — wtfzambo, Jul 12 '21 at 09:01
Well it is fixed for one operation on the column if that is what you mean. But the value can be variable. — Borut Flis, Jul 12 '21 at 09:22
Might be wrong without a sample input & expected output pair but what about `df.groupby(np.arange(len(df)) // n)["col_name"].transform("last")` — Mustafa Aydın, Jul 12 '21 at 09:27
Glad it works. Trying to explain what it does in case for some future reader: grouping by every n'th element of the frame can be done by looking at 0...N-1 values' dividents after dividing by n. e.g., for 0..7 values with n = 3, we get 0, 0, 0, 1, 1, 1, 2. Then `transform` with `last` gets the last entry of each group and produces a like-indexed series via repeating it for each group member. — Mustafa Aydın, Jul 12 '21 at 09:40
@MustafaAydın that was clever, should make it an answer imho — wtfzambo, Jul 12 '21 at 09:46
@wtfzambo thanks, (i didn't come up with this, saw in some answer probably so cleverness isn't mine :ğ); added as an answer now. — Mustafa Aydın, Jul 12 '21 at 10:13

score 3 · Accepted Answer · answered Jul 12 '21 at 10:13

You can go for

df.groupby(np.arange(len(df)) // n)[col_name].transform("last")

Grouping by every n'th element of the frame can be done by looking at 0...N-1 values' dividents after dividing by n. e.g., for 0..7 values with n = 3, we get 0, 0, 0, 1, 1, 1, 2. Then transform with last gets the last entry of each group and produces a like-indexed series via repeating it for each group member.

For the sample given:

>>> df

             names
0     sakshijoshii
1     medpagetoday
2        nickmmark
3  mukeshm07384110
4     DipakBiswas_
5  jaysanchezdorta
6     Terry6969696
7        LizShelby
8        wlharper1
9   BruhOriginalMe

>>> n = 3
>>> col_name = "names"
>>> df.groupby(np.arange(len(df)) // n)[col_name].transform("last")

0          nickmmark
1          nickmmark
2          nickmmark
3    jaysanchezdorta
4    jaysanchezdorta
5    jaysanchezdorta
6          wlharper1
7          wlharper1
8          wlharper1
9     BruhOriginalMe

How to resample value in pandas column?

1 Answers1