0

I am trying to find columns hitting specific conditions and put a value in the column col.

My current implementation is:

df.loc[~(df['myCol'].isin(myInfo)), 'col'] = 'ok'

In the future, myCol will have multiple info. So I need to split the value in myCol without changing the dataframe and check if any of the splitted values are in myInfo. If one of them are, the current row should get the value 'ok' in the column col. Is there an elegant way without really splitting and saving in an extra variable? Currently, I do not know how the multiple info will be represented (either separated by a character or just concatenated one after one, each consisting of 4 alphanumeric values).

thestruggleisreal
  • 940
  • 3
  • 10
  • 26
  • could you add a sample desired df to your code? i m thinking of using list comprehension. something like `[x in myInfo for x in df['myCol'].str.split()]` – Rebin Apr 23 '19 at 15:26

1 Answers1

0

Let's say you need to split on "-" for your myCol column.

sep='-'
deconcat = df['MyCol'].str.split(sep, expand=True)
new_df=df.join(deconcat)

The new_df DataFrame will have the same index as df, therefore you can do what you want with new_df and then join back to df to filter it how you want.

You can do the above .isin code for each of the new split columns to get your desired result.

Source: Code taken from the pyjanitor documentation which has a built-in function, deconcatenate_column, that does this.

Source code for deconcatenate_column

Sam
  • 541
  • 1
  • 3
  • 10