4

I have a dataframe that looks like this:

             B
A             
0.00    5.7096
7.33    8.0280
25.82  15.7212
43.63  19.5156
55.24  20.1888

and I want to add rows with the index at regular intervals (say by 10), so that I can then interpolate the column B with method = 'index'. My desired output is this:

             B
A             
0.00    5.7096
7.33    8.0280
10.00      NaN
20.00      NaN
25.82  15.7212
30.00      NaN
40.00      NaN
43.63  19.5156
50.00      NaN
55.24  20.1888
60.00      NaN

I haven't found any reindex option that adds index elements instead of changing them. My best solution is create a new index, append it to the original dataframe, sort and remove duplicates (if any), but I'm pretty sure there is a better solution.

step = 10
idx = pd.DataFrame(index = df.index).reindex([round(i, 0) for i in np.arange(df.index[0], df.index[-1] + step, step)])
df = df.append(idx)
df.sort_index(inplace = True)
df = df[~df.index.duplicated()]

any suggestion? thanks

ilmatte
  • 51
  • 3

3 Answers3

3

Effectively do a union by doing an outer join.

df = pd.read_csv(io.StringIO("""A             B
0.00    5.7096
7.33    8.0280
25.82  15.7212
43.63  19.5156
55.24  20.1888"""), sep="\s+").set_index("A")

df = df.join(pd.DataFrame(index=pd.RangeIndex(0,60, 10)), how="outer")

B
0 5.7096
7.33 8.028
10 nan
20 nan
25.82 15.7212
30 nan
40 nan
43.63 19.5156
50 nan
55.24 20.1888
Rob Raymond
  • 29,118
  • 3
  • 14
  • 30
  • seems perfect (solving in just one line), but when I try to change the range to the first and last values of the original index (`df = df.join(pd.DataFrame(index=pd.RangeIndex(df.index[0], df.index[-1], 10)), how="outer")` I get this error: `TypeError: Wrong type for value 55.24` I guess RangeIndex doesn'g to well with float boundaries... – ilmatte Feb 09 '21 at 19:38
  • `df = df.join(pd.DataFrame(index=pd.RangeIndex(df.index[0].astype(int),df.index[-1].astype(int), 10)), how="outer")` works, changes sample data to go against *0.1* then it does insert 0.0 – Rob Raymond Feb 09 '21 at 19:47
1
idx = sorted(set(list(np.arange(70, step=10)) + list(df.index)))
df = df.reindex(idx)

Output:

df Out[59]:

             B
A             
0.00    5.7096
7.33    8.0280
10.00      NaN
20.00      NaN
25.82  15.7212
30.00      NaN
40.00      NaN
43.63  19.5156
50.00      NaN
55.24  20.1888
60.00      NaN
KRKirov
  • 3,854
  • 2
  • 16
  • 20
0

How about just reindexing the dataframe? The outcome is identical to the solution you provided:

lista = [0.00, 7.33, 25.82, 43.63, 55.24]
listb = [5.7096, 8.0280, 15.7212, 19.5156, 20.1888]
df = pd.DataFrame({'A':lista, 'B':listb})

# range(start, stop, step)
# Create new index and increment
my_increment = [i for i in range(10, 20, 10)]
# combine old and new index, here is where you can use the list function to sort them as well
new_index = df.index.tolist() + my_increment

print(df.reindex(new_index))

Output:

        A        B
0    0.00   5.7096
1    7.33   8.0280
2   25.82  15.7212
3   43.63  19.5156
4   55.24  20.1888
10    NaN      NaN
Simon
  • 1,201
  • 9
  • 18
  • this is very similar to my solution, creating with a new index that combines the existing and the new values. slightly different (I'm not reindexing) but with a similar amount of code. – ilmatte Feb 09 '21 at 19:32