Question + MWE
How can I drop/remove multiple ranges of rows from a Pandas DataFrame with a (two level) multi index looking like this:
idx1 idx2 | value(s) ...
------------------------------------------
4 0 | 1.123456 ...
1 | 2.234567 ...
2 | 0.012345 ...
8 0 | -1.123456 ...
1 | -0.973915 ...
2 | 1.285553 ...
3 | -0.194625 ...
4 | -0.144112 ...
... ... | ... ...
The ranges to drop/remove are currently residing in a list like this:
ranges = [[(4, 1), (4, 2)], # range (4,1):(4,2)
[(8, 0), (8, 3)], # range (8,0):(8,3)
[(8, 5), (8, 10)], ...] # range (8,5):(8,10)
The main problem is, that most methods I found, don't support either multi indexing or slicing or multiple slices/ranges.
What is the best/fastest way to do so.
Current ugly solution
for range in ranges:
df = df.drop(df.loc[range[0]:range[1]].index)
Is slow and ugly, but is the only workable solution I found, combining multi indexing, slicing and in a way multiple ranges (by looping over the ranges).
Solution comparison
All three proposed solutions work. They all solve the issue by transforming the list of slices into a list of all individual tuples within those slice.
Slices to complete set of tuples
The fastest way to do so is @ALollz solution:
idx = [(x, z) for (x, i), (_, j) in ranges for z in np.arange(i,j+1,1)]
Performance
Regarding the removal of the rows, all solutions work, but there's a big difference in performance (all performance data based on my data set with ~10 Mio. entries)
@ALollz + @Ben. T's combined solution (~19 sec.)
df.drop(pd.MultiIndex.from_tuples(idx))
or without creating a
MultiIndex
objectdf.drop(idx)
@ALollz first solution (~75 sec.)
df.loc[list(set(df.index.values) - set(idx))]
@user3471881's solution (~95 sec.)
df.loc[~df.index.isin(ranges)]
my ugly solution (~350 sec.)
see above