I have the following df;
{'country': {18: 'Argentina', 19: 'Argentina', 20: 'Argentina', 21: 'Brazil', 22: 'Brazil'}, 'year': {18: 1998, 19: 1999, 20: 2000, 21: 1980, 22: 1981}, 'Inflation (Annual %)': {18: 0.4892870291442586, 19: 0.5072515700889069, 20: -1.4451253758139493, 21: 'nan', 22: 101.725072957927}, 'PostTreatmentYear': {18: 1, 19: 1, 20: 1, 21: 0, 22: 0}}
I would like to create a 'treatment' variable where it is equal to 1, if the country is a certain country, 0 otherwise. I have been sucessfully doing this as follows;
df['Treatment'] = np.where(df['country'] == 'Argentina', 1, 0)
Which works, appending a new column called Treatment, which is one for Argentina, zero for all other countries. There are several countries that I want to add to the treatment, but obviously when I repeat the code for the other countries, It turns the previously correctly set ones, to zero.
So I tried to create one line, with or statements, in order to avoid this. I did the following;
print(df.country.unique())
to get a list of all the unique country names, then I tried (a shortened example):
df['Treatment'] = np.where(df['country'] == 'Argentina' or 'Brazil' or 'Chile', 1, 0)
This results in the following error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
My question is two fold really;
- How to take a list and add operators between elements, in order to do
df['Treatment'] = np.where(df['country'] == 'Argentina' or 'Brazil' or 'Chile', 1, 0)
more efficently, just taking my list from
print(df.country.unique())
in order to save time (as there are many countries)
- How to correctly specify the operation here for (or one like it if this is incorrect
df['Treatment'] = np.where(df['country'] == 'Argentina' or 'Brazil' or 'Chile', 1, 0)
To get it to work as intended.
Any help is greatly appreciated.