I oversimplified this greatly - the actual column will contain up to 500 characters and the substring list will have 60 values, anywhere from 10-80 characters
dataframe is more complicated than this, list would contain 60 values, DF would have 11,000 rows, but here is what I am trying to do
I have a dataframe like so, and a list
my_list = ['alabama 500', 'beta 15', 'carthouse', 'd320 blend']
df = pd.DataFrame({'col1':['left side alabama 500 on the right side carthouse', '1st entry is at beta 15', 'this one takes a mix of d320 blend and beta 15']})
col1
0 left side alabama 500 on the right side carthouse
1 1st entry is at beta 15
2 this one takes a mix of d320 blend and beta 15
I am attempting to write a function that would return this, keep the first column intact and return the substring in a new column with the original column intact
df['col2']
col1 col2
0 left side alabama 500 on the right side carthouse alabama 500
1 left side alabama 500 on the right side carthouse carthouse
2 1st entry is at beta 15 beta 15
3 this one takes a mix of d320 blend and beta 15 beta 15
4 this one takes a mix of d320 blend and beta 15 d320 blend
here's what I have tried
def add_new_col(data, col_name, my_list):
#function looks at the column col_name in a dataframe data, if the substring exists, it adds a new
#column with only that substring, keeping multiples
for i in my_list:
if data[col_name].str.contains(i):
data['col2'] = i
else:
continue
return data
running the function in the notebook
my_list = ['a', 'b', 'c', 'd']
add_new_col(df, 'col1', my_list)
returns this error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
based on some other answers I also tried this
def add_new_col(data, col_name, my_list):
#function looks at the column col_name in a dataframe data, if the substring exists, it adds a new
#column with only that substring, keeping multiples
for i in my_list:
if data[data[col_name].str.contains(i)]:
data['col2'] = i
else:
continue
return data
which gave the same error code