2

I searched a lot here for an answer that could solve this but couldn't find. The desired result is to fill only gaps when the extremities are equal values, limited to lengths of 4 values:

My dataset:

0     NaN
1     NaN
2     NaN
3     5.0
4     5.0
5     NaN
6     NaN
7     5.0
8     6.0
9     NaN
10    NaN
11    NaN
12    NaN
13    NaN
14    NaN
15    5.0
16    5.0
17    NaN
18    NaN
19    6.0
20    6.0
21    NaN
22    NaN
23    NaN
24    NaN
25    5.0
26    NaN
27    NaN
28    NaN
29    NaN
30    NaN
31    NaN
32    NaN
33    5.0
34    NaN
35    NaN

The desired result (fill only gaps when the extremities are equal values, limited for gaps of length of 4):

0     NaN   # Not filled since the gap ends with 5 but this is the dataset beginning (don't know how it starts)
1     NaN   # Not filled since the gap ends with 5 but this is the dataset beginning (don't know how it starts)
2     NaN   # Not filled since the gap ends with 5 but this is the dataset beginning (don't know how it starts)
3     5.0  # Original dataset
4     5.0  # Original dataset
5     5.0    # Filled since the gap starts with 5 and ends with 5 (and is smaller than 4 values)
6     5.0    # Filled since the gap starts with 5 and ends with 5 (and is smaller than 4 values)
7     5.0  # Original dataset
8     6.0  # Original dataset
9     NaN    # Not filled since the gap starts with 6 and ends with 5
10    NaN         .
11    NaN         .
12    NaN         .
13    NaN         .
14    NaN    # Not filled since the gap starts with 6 and ends with 5
15    5.0  # Original dataset
16    5.0  # Original dataset
17    NaN    # Not filled since the gap starts with 5 and ends with 6
18    NaN    # Not filled since the gap starts with 5 and ends with 6
19    6.0  # Original dataset
20    6.0  # Original dataset
21    NaN    # Not filled since the gap starts with 6 and ends with 5
22    NaN         .
23    NaN         .
24    NaN    # Not filled since the gap starts with 6 and ends with 5
25    5.0  # Original dataset
26    5.0    # Filled since the gap starts with 5 and ends with 5
27    5.0    # Filled since the gap starts with 5 and ends with 5
28    5.0    # Filled since the gap starts with 5 and ends with 5
29    5.0    # Filled since the gap starts with 5 and ends with 5
30    NaN    # Not filled since maximum gap is 4
31    NaN    # Not filled since maximum gap is 4
32    NaN    # Not filled since maximum gap is 4
33    5.0  # Original dataset
34    NaN    # Not filled since the gap starts with 5 but this is the dataset end (don't know how it ends)
35    NaN    # Not filled since the gap starts with 5 but this is the dataset end (don't know how it ends)
0m3r
  • 12,286
  • 15
  • 35
  • 71
User365Go
  • 91
  • 9

2 Answers2

3

it should be something like this:

def extremities(arr):
nones = [i for i,x in enumerate(arr) if x == None]
not_nones = [i for i,x in enumerate(arr) if x != None]
for i in nones:
    try:
        start = [x for x in not_nones if x < i][-1]
        finish = [x for x in not_nones if x > i][0]
    except:
        continue
    if arr[start] == arr[finish] and i - start < 5:
        arr[i] = arr[start]
return arr

Edited:

Sorry, I forgot it's limited to lengths of 4 values. I edited the code.

2

We can use boolean masking and cumsum to identify the blocks of NaN values that starts and ends with the same value, then group the column on these blocks and forward fill with limit of 4

s = df['col']
m = s.notna()
s.mask(s[m] != s[m].shift(-1)).groupby(m.cumsum()).ffill(limit=4).fillna(s)

0     NaN
1     NaN
2     NaN
3     5.0
4     5.0
5     5.0
6     5.0
7     5.0
8     6.0
9     NaN
10    NaN
11    NaN
12    NaN
13    NaN
14    NaN
15    5.0
16    5.0
17    NaN
18    NaN
19    6.0
20    6.0
21    NaN
22    NaN
23    NaN
24    NaN
25    5.0
26    5.0
27    5.0
28    5.0
29    5.0
30    NaN
31    NaN
32    NaN
33    5.0
34    NaN
35    NaN
Name: col, dtype: float64
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53
  • 1
    This is beautiful!!! Simple, fast and effective! The "s.mask(s[m] != s[m].shift(-1))" idea really broke this problem into an easy solution. How did you came up with this idea?? :) – User365Go May 09 '21 at 13:52
  • @User365Go Glad i could help. Regarding the idea, it came up with lots of experience and problem solving :P – Shubham Sharma May 09 '21 at 14:57