I am working with a large csv file (1.8 GB) and am trying to divide two variables (rows) by each other to create a new variable. It works for some variables; however, not for all. The code is as described below:
#get the variables out
df1 = DF_GLOBAL.loc[DF_GLOBAL['VARIABLE'] == var_1]
df2 = DF_GLOBAL.loc[DF_GLOBAL['VARIABLE'] == var_2]
#stack them to easier divide them
#I also do that to avoid dividing explainatory varaiables by each other, such as unit of the values
df1 = df1.set_index(['STACK_VAR_1', 'STACK_VAR_2','STACK_VAR_3'])
df2 = df2.set_index(['STACK_VAR_1', 'STACK_VAR_2','STACK_VAR_3'])
df3 = df2.loc[:,first_year:] / df1.loc[:,first_year:]
#df3 is the desired outcome of the division, which is later attached to the DF_GLOBAL again.
The error message I get is: TypeError: unsupported operand type(s) for /: 'str' and 'str'
Indicating, that some of the values are strings that are divide. Therefore, I am wondering how I would be able to skip strings (probably 'NaN'). It is not possible for me to loop through the dataframe, as it is too large for it to be efficient.