1

I have a dataset which i read in by

data = pd.read_excel('....\data.xlsx')
data = data.fillna(0)

and i made them all strings

data['Block']=data['Block'].astype(str)
data['Concentration']=data['Concentration'].astype(str)
data['Name']=data['Name'].astype(str)

data looks like this

Block             Con        Name  
  1               100         A
  1               100         A
  1               100         A
  1               33          B
  1               33          B
  1               33          B
  1               0           c
  1               0           c
  1               0           c
  2               100         A
  2               100         A
  2               100         A
  2               100         B
  2               100         B
  2               100         B
  2               33          B
  2               33          B
  2               33          B
  2               0           c
  2               0           c
  2               0           c
 ...
 ...
  24               0          E

I inserted a column 'replicate' :

 data['replicate'] = '' 

data now looks like this

Block             Con        Name      replicate
  1               100         A
  1               100         A
  1               100         A
  1               33          B
  1               33          B
  1               33          B
  1               0           c
  1               0           c
  1               0           c
  2               100         A
  2               100         A
  2               100         A
  2               100         B
  2               100         B
  2               100         B
  2               33          B
  2               33          B
  2               33          B
  2               0           c
  2               0           c
  2               0           c
 ...
 ...
  24               0          E

each Block|con|name combination has 3 replicates, how would I fill out the 'replicate' column with 1,2,3 going down the column?

desired output would be

Block             Con        Name      replicate
  1               100         A           1
  1               100         A           2
  1               100         A           3
  1               33          B           1
  1               33          B           2
  1               33          B           3
  1               0           c           1
  1               0           c           2
  1               0           c           3
  2               100         A           1
  2               100         A           2
  2               100         A           3
  2               100         B           1
  2               100         B           2
  2               100         B           3
  2               33          B           1
  2               33          B           2
  2               33          B           3
  2               0           c           1
  2               0           c           2
  2               0           c           3
 ...
 ...
  24               0          E           3

pseudo code would be:

 for b in data.block:
      for  c in data.con:
          for n in data.name:
              for each b|c|n combination:
                  if the same:
                   assign '1' to data.replicate
                   assign '2' to data.replicate
                   assign '3' to data.replicate

i have searched online and have not found any solution, and i'm not sure which function to use for this.

Jessica
  • 2,923
  • 8
  • 25
  • 46

2 Answers2

1

That looks like a groupby cumcount:

In [11]: df["Replicate"] = df.groupby(["Block", "Con", "Name"]).cumcount() + 1

In [12]: df
Out[12]:
    Block  Con Name  Replicate
0       1  100    A          1
1       1  100    A          2
2       1  100    A          3
3       1   33    B          1
4       1   33    B          2
5       1   33    B          3
6       1    0    c          1
7       1    0    c          2
8       1    0    c          3
9       2  100    A          1
10      2  100    A          2
11      2  100    A          3
12      2  100    B          1
13      2  100    B          2
14      2  100    B          3
15      2   33    B          1
16      2   33    B          2
17      2   33    B          3
18      2    0    c          1
19      2    0    c          2
20      2    0    c          3

cumcount enumerates the rows in each group (from 0).

Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • I am no sure why but today when i was trying this function on my real dataset, which is much larger, it ignores the 'concentration' column. so for 2 different block|name|con combos with differen con values it assigns 1 to both of them. – Jessica Nov 07 '15 at 21:16
  • @Jessica please post a new issue :) – Andy Hayden Nov 07 '15 at 21:21
  • please see my new post here , thanks http://stackoverflow.com/questions/33588156/groupby-issues-of-not-recognizing-numeric-column-pandas-python – Jessica Nov 07 '15 at 21:46
0

You can use numpy.tile:

import numpy as np
replicate_arr = np.tile(['1', '2', '3'], len(data)/3)
data['replicate'] = replicate_arr
shx2
  • 61,779
  • 13
  • 130
  • 153