split data frame pandas if sequence of column value change

Question

I have a dataset in a form of:

A   B   C   D   label
6   2   6   8     0
2   5   3   6     0 
4   3   4   9     1 
5   7   5   5     1
6   4   5   8     0

in which each row is a label with a unique value, and that unique value is repeating after some lines, so there are 7 labels to 7000 lines if I do df.loc[df['label'] == 0] it will grab all the values of 0 labeled rows, but I want to extract the values according to the first label set of 0, if there are first 10 rows labeled as 0, then it just brings them not others label 0 in the data frame

BENY · Accepted Answer · 2018-03-16T14:30:24.923

2

We may need a new parameter here

df=df.assign(new=df.label.diff().ne(0).cumsum())
df[df.new==df.groupby('label').new.transform('min')]
Out[206]: 
   A  B  C  D  label  new
0  6  2  6  8      0    1
1  2  5  3  6      0    1
2  4  3  4  9      1    2
3  5  7  5  5      1    2

Save to list

s=df[df.new==df.groupby('label').new.transform('min')];
l=[df1 for _, df1 in s.groupby('label')]

edited Mar 16 '18 at 14:30

answered Mar 16 '18 at 14:05

BENY

317,841
20
164
234

I also need to save this like this format, 1-0, which is 1 file number and 0 is a label, but I have 7 label and 7000 lines, labels are coming randomly but in a form of 5 or 10 set of rows, and I need to save them separately, hope you understand – jackson Mar 16 '18 at 14:13
can you please write what is in "for" block, I mean write completely this statements, and semicolon in s =..... is by mistake? – jackson Mar 16 '18 at 14:27
its an error in l=[for _, df1 in s.groupby('label')] it says expression requires underline for – jackson Mar 16 '18 at 14:31
@MuhammadHassan i changed ...`[df1 for _, df1 in s.groupby('label')]`, check the update – BENY Mar 16 '18 at 14:32
@ Wen I need to save the files in this style, like 'new-label'.csv, is there any way so I can do it, just pick the new column values and there corresponding values in label, – jackson Mar 16 '18 at 14:43
@MuhammadHassan l=[df1 for _, df1 in s.groupby('New')] you can try – BENY Mar 16 '18 at 14:46

split data frame pandas if sequence of column value change

1 Answers1

Linked