3

I am new to pandas.

I want to add a new column to a pandas dataframe df and assign "Start" to every odd row and "Stop" to every even row.

However, when I do df.iloc[1::2, :] = "Start", I am inserting a new row at every 2nd position with the "Start" string in every column.

I know that in this case, pandas doesn't know in which column to put the "Start"-string.

However, I couldn't figure out the correct syntax.

Nick ODell
  • 15,465
  • 3
  • 32
  • 66
sudonym
  • 3,788
  • 4
  • 36
  • 61

1 Answers1

5

Here's my solution - Haven't figured out the optimization part but given a fairly large dataset this should handle it quite well -

import pandas as pd

df = pd.read_csv('temp.csv')

df['New_Col'] = "Start"

df.loc[1::2,"New_Col"] = "Stop"

print df['New_Col']

Output -

0      Start
1       Stop
2      Start
3       Stop
4      Start
5       Stop
6      Start
7       Stop
8      Start
9       Stop
10     Start
11      Stop
12     Start
13      Stop
14     Start
15      Stop
16     Start
17      Stop
18     Start
19      Stop
20     Start
21      Stop
22     Start
23      Stop
24     Start
25      Stop
26     Start
27      Stop
28     Start
29      Stop
       ...  
116    Start
117     Stop
118    Start
119     Stop
120    Start
121     Stop
122    Start
123     Stop
124    Start
125     Stop
126    Start
127     Stop
128    Start
129     Stop
130    Start
131     Stop
132    Start
133     Stop
134    Start
135     Stop
136    Start
137     Stop
138    Start
139     Stop
140    Start
141     Stop
142    Start
143     Stop
144    Start
145     Stop
Name: New_Col, dtype: object
Vivek Kalyanarangan
  • 8,951
  • 1
  • 23
  • 42
  • 1
    thanks man - solved. this works perfectly with a 6GB text file on a 8GB RAM machine. – sudonym Nov 28 '16 at 05:01
  • how would you count the "Start" and "Stop" events and put the number of every event in an additional column? – sudonym Nov 28 '16 at 05:02
  • Just take it in a separate series and deal with it...putting this in the original dataset would denormalize it - `print df.groupby('Event')['New_Col'].agg(['count'])` – Vivek Kalyanarangan Nov 28 '16 at 05:11
  • 1
    Create the column as Categorical type so that you spare millions of Start and Stop strings in memory but for the same visual results. – Zeugma Nov 28 '16 at 05:26