0

I am working with a large csv file (1.8 GB) and am trying to divide two variables (rows) by each other to create a new variable. It works for some variables; however, not for all. The code is as described below:

#get the variables out
    df1 = DF_GLOBAL.loc[DF_GLOBAL['VARIABLE'] == var_1]
    df2 = DF_GLOBAL.loc[DF_GLOBAL['VARIABLE'] == var_2]
#stack them to easier divide them
#I also do that to avoid dividing explainatory varaiables by each other, such as unit of the values

    df1 = df1.set_index(['STACK_VAR_1', 'STACK_VAR_2','STACK_VAR_3'])
    df2 = df2.set_index(['STACK_VAR_1', 'STACK_VAR_2','STACK_VAR_3'])
    
        

    df3 = df2.loc[:,first_year:] / df1.loc[:,first_year:]
#df3 is the desired outcome of the division, which is later attached to the DF_GLOBAL again.

The error message I get is: TypeError: unsupported operand type(s) for /: 'str' and 'str'

Indicating, that some of the values are strings that are divide. Therefore, I am wondering how I would be able to skip strings (probably 'NaN'). It is not possible for me to loop through the dataframe, as it is too large for it to be efficient.

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
  • 1
    The easiest way would be to delete null values (NaN) before dividing, using e.g. DataFrame.dropna(). Have you considered this? – Falco Jun 26 '20 at 13:45
  • Yes, but it is not really feasible for me here, as I do not want to drop rows completely. Some rows have values from 1990 (year) onwards, other from 2000 or 2010 etc. And if I start in 1990, I would still like to have the rows with the data from e.g. 2000 to be divide from 2000 onwards. - But thank you already for the possible solution :) – ChrisCleaner Jun 27 '20 at 10:54

0 Answers0