I'm creating a new DataFrame from scratch, but I'm not sure the way I'm doing it is the most efficient way.
I'm creating:
- column Never where 3070 = 1
- column Occasional 1100 = 1
- column Frequent 2200 = 1
I'm also creating a new column Police:
- where 70 rows = 1 and column Never = 1
- where 110 rows = 1 and column Occasional = 1
- where 220 rows = 1 and column Frequent = 1
Code:
# create dataframes for each column
df1 = pd.concat([pd.DataFrame([1], columns=['NEVER']) for i in range(3070)],
ignore_index=True)
df2 = pd.concat([pd.DataFrame([1], columns=['OCCASIONAL']) for i in range(1100)],
ignore_index=True)
df3 = pd.concat([pd.DataFrame([1], columns=['FREQUENT']) for i in range(2200)],
ignore_index=True)
# combine dataframes into one
frames = [df1, df2, df3]
df = pd.concat(frames)
# reset index
df = df.reset_index(drop=True)
df['POLICE'] = 0.0
# replace police column values
df.loc[0:69,'POLICE']=1.0
df.loc[3071:3180,'POLICE']=1.0
df.loc[5271:5490,'POLICE']=1.0
# convert NaN into 0
values=(0.0)
df = df.fillna(value=values)
I think I've done it, but my code takes ages to process. Is it a normal thing as I'm creating 6000+ rows or my code is inefficient?