0

I have the following df;

{'country': {18: 'Argentina', 19: 'Argentina', 20: 'Argentina', 21: 'Brazil', 22: 'Brazil'}, 'year': {18: 1998, 19: 1999, 20: 2000, 21: 1980, 22: 1981}, 'Inflation (Annual %)': {18: 0.4892870291442586, 19: 0.5072515700889069, 20: -1.4451253758139493, 21: 'nan', 22: 101.725072957927}, 'PostTreatmentYear': {18: 1, 19: 1, 20: 1, 21: 0, 22: 0}}

I would like to create a 'treatment' variable where it is equal to 1, if the country is a certain country, 0 otherwise. I have been sucessfully doing this as follows;

df['Treatment'] = np.where(df['country'] == 'Argentina', 1, 0)

Which works, appending a new column called Treatment, which is one for Argentina, zero for all other countries. There are several countries that I want to add to the treatment, but obviously when I repeat the code for the other countries, It turns the previously correctly set ones, to zero.

So I tried to create one line, with or statements, in order to avoid this. I did the following;

print(df.country.unique())

to get a list of all the unique country names, then I tried (a shortened example):

df['Treatment'] = np.where(df['country'] == 'Argentina' or 'Brazil' or 'Chile', 1, 0)

This results in the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

My question is two fold really;

  1. How to take a list and add operators between elements, in order to do
    df['Treatment'] = np.where(df['country'] == 'Argentina' or 'Brazil' or 'Chile', 1, 0)

more efficently, just taking my list from

print(df.country.unique())

in order to save time (as there are many countries)

  1. How to correctly specify the operation here for (or one like it if this is incorrect
df['Treatment'] = np.where(df['country'] == 'Argentina' or 'Brazil' or 'Chile', 1, 0)

To get it to work as intended.

Any help is greatly appreciated.

0 Answers0