2

I know about the resample function on time-series data. I want something similar on a normal column with 3000 examples. I want to keep the length. I want every row to have the value of the last occurrence in a n- long window.

I know about group by as well and the last function, but here I am groupping based on length not on some value.

I want non-overlapping windows, so rolling does not help either.

Example of window of size three:

0         sakshijoshii
1         medpagetoday
2            nickmmark
3      mukeshm07384110
4         DipakBiswas_       
5      jaysanchezdorta
6         Terry6969696
7            LizShelby
8            wlharper1
9       BruhOriginalMe

What I want:

0            nickmmark
1            nickmmark
2            nickmmark
3      jaysanchezdorta
4      jaysanchezdorta      
5      jaysanchezdorta
6            wlharper1
7            wlharper1
8            wlharper1
9       BruhOriginalMe
Borut Flis
  • 15,715
  • 30
  • 92
  • 119
  • 1
    Oof, I remember having to do something similar long ago and couldn't find an efficient solution that didn't use an iteration. Is the length of your window a fixed value or does it change? – wtfzambo Jul 12 '21 at 09:01
  • Well it is fixed for one operation on the column if that is what you mean. But the value can be variable. – Borut Flis Jul 12 '21 at 09:22
  • 1
    Yeah that's what I meant, fixed for the full operation – wtfzambo Jul 12 '21 at 09:24
  • 2
    Might be wrong without a sample input & expected output pair but what about `df.groupby(np.arange(len(df)) // n)["col_name"].transform("last")` – Mustafa Aydın Jul 12 '21 at 09:27
  • 1
    @MustafaAydın correct, thank you! – Borut Flis Jul 12 '21 at 09:35
  • 3
    Glad it works. Trying to explain what it does in case for some future reader: grouping by every n'th element of the frame can be done by looking at 0...N-1 values' dividents after dividing by n. e.g., for 0..7 values with n = 3, we get 0, 0, 0, 1, 1, 1, 2. Then `transform` with `last` gets the last entry of each group and produces a like-indexed series via repeating it for each group member. – Mustafa Aydın Jul 12 '21 at 09:40
  • @MustafaAydın that was clever, should make it an answer imho – wtfzambo Jul 12 '21 at 09:46
  • 1
    please add a [mcve] and see [ask] – Umar.H Jul 12 '21 at 09:54
  • Ok, I added it. – Borut Flis Jul 12 '21 at 10:04
  • @wtfzambo thanks, (i didn't come up with this, saw in some answer probably so cleverness isn't mine :ğ); added as an answer now. – Mustafa Aydın Jul 12 '21 at 10:13

1 Answers1

3

You can go for

df.groupby(np.arange(len(df)) // n)[col_name].transform("last")

Grouping by every n'th element of the frame can be done by looking at 0...N-1 values' dividents after dividing by n. e.g., for 0..7 values with n = 3, we get 0, 0, 0, 1, 1, 1, 2. Then transform with last gets the last entry of each group and produces a like-indexed series via repeating it for each group member.

For the sample given:

>>> df

             names
0     sakshijoshii
1     medpagetoday
2        nickmmark
3  mukeshm07384110
4     DipakBiswas_
5  jaysanchezdorta
6     Terry6969696
7        LizShelby
8        wlharper1
9   BruhOriginalMe

>>> n = 3
>>> col_name = "names"
>>> df.groupby(np.arange(len(df)) // n)[col_name].transform("last")

0          nickmmark
1          nickmmark
2          nickmmark
3    jaysanchezdorta
4    jaysanchezdorta
5    jaysanchezdorta
6          wlharper1
7          wlharper1
8          wlharper1
9     BruhOriginalMe
Mustafa Aydın
  • 17,645
  • 4
  • 15
  • 38