2

I have a dataset in a form of:

A   B   C   D   label
6   2   6   8     0
2   5   3   6     0 
4   3   4   9     1 
5   7   5   5     1
6   4   5   8     0

in which each row is a label with a unique value, and that unique value is repeating after some lines, so there are 7 labels to 7000 lines if I do df.loc[df['label'] == 0] it will grab all the values of 0 labeled rows, but I want to extract the values according to the first label set of 0, if there are first 10 rows labeled as 0, then it just brings them not others label 0 in the data frame

honza_p
  • 2,073
  • 1
  • 23
  • 37
jackson
  • 101
  • 1
  • 8

1 Answers1

2

We may need a new parameter here

df=df.assign(new=df.label.diff().ne(0).cumsum())
df[df.new==df.groupby('label').new.transform('min')]
Out[206]: 
   A  B  C  D  label  new
0  6  2  6  8      0    1
1  2  5  3  6      0    1
2  4  3  4  9      1    2
3  5  7  5  5      1    2

Save to list

s=df[df.new==df.groupby('label').new.transform('min')];
l=[df1 for _, df1 in s.groupby('label')]
BENY
  • 317,841
  • 20
  • 164
  • 234
  • I also need to save this like this format, 1-0, which is 1 file number and 0 is a label, but I have 7 label and 7000 lines, labels are coming randomly but in a form of 5 or 10 set of rows, and I need to save them separately, hope you understand – jackson Mar 16 '18 at 14:13
  • can you please write what is in "for" block, I mean write completely this statements, and semicolon in s =..... is by mistake? – jackson Mar 16 '18 at 14:27
  • its an error in l=[for _, df1 in s.groupby('label')] it says expression requires underline for – jackson Mar 16 '18 at 14:31
  • @MuhammadHassan i changed ...`[df1 for _, df1 in s.groupby('label')]`, check the update – BENY Mar 16 '18 at 14:32
  • @ Wen I need to save the files in this style, like 'new-label'.csv, is there any way so I can do it, just pick the new column values and there corresponding values in label, – jackson Mar 16 '18 at 14:43
  • @MuhammadHassan l=[df1 for _, df1 in s.groupby('New')] you can try – BENY Mar 16 '18 at 14:46