I have a dataframe df, containing only one column 'Info', which I want to split into multiple dataframes based on a list of indices, ls = [23,76,90,460,790]. If I want to use np.array_split(), how do I pass the list so that it parses the data from these indices with each index being the first row of split dataframes.
Asked
Active
Viewed 338 times
1
-
1Would using `ls = [23,76,90,460,790]` result in 5 DF's - could you elaborate a bit please? – Jon Clements Dec 09 '21 at 18:09
-
Yes, first dataframe should start from row 23 to 75, then second one from 76 to 89 and so on. – ABC Dec 09 '21 at 18:15
-
and 790 should be "until end"? – Jon Clements Dec 09 '21 at 18:16
-
Yes that's right. – ABC Dec 09 '21 at 18:18
-
https://stackoverflow.com/a/53395439/6361531 – Scott Boston Dec 09 '21 at 18:39
-
1@Scott ahh... I didn't find that in my search for a possible duplicate (and thoughtful of you to not immediately close as a duplicate as it leads to your own answer :) - feel free to close as a duplicate (although I prefer my use of zip_longest - but I can add to there or you're more than welcome to add it to your answer there). – Jon Clements Dec 09 '21 at 18:48
1 Answers
-1
I don't think you can use np.array_split()
here (you can access the underlying .values
of the primary DF but you'd get back numpy arrays - not DFs...) - what you can do is use .iloc
and "slice" from your DF, eg:
from itertools import zip_longest
dfs = [df.iloc[s: e] for s, e in zip_longest(ls[::2], ls[1::2])]

Jon Clements
- 138,671
- 33
- 247
- 280