Filtering for 'contains value' in a dataset in Python

Question

In Python how do you filter a column by values that contain a particular value?

An example is a data set with a column called 'City' and the values could be 'Sydney', 'Greater Sydney', 'North Sydney' etc. If a use input 'Sydney' how can I ensure that all variations are included in the filtering?

#user inputs column
input1 = input()
country_city = input1.title()

#user inputs value
input2 = input()
country_city_value = input2.title()

#filtering step (current)
filtered = dataset[dataset[country_city] == country_city_value]
print(filtered)

Aviv Yaniv · Answer 1 · 2020-08-28T06:50:22.580

1

If you want to filter contents that contain the input word, apply the str.contains:

data = {'City': ['Sydney', 'Greater Sydney', 'North Sydney'],}

dataset = pd.DataFrame(data, columns = ['City'])

#user inputs column
input1 = input()              
country_city = input1.title()        # 'City'

#user inputs value
input2 = input()
country_city_value = input2.title()  # 'Sydney'

#filtering step (current)
filtered = dataset[dataset[country_city].str.contains(country_city_value)]

#              City
# 0          Sydney
# 1  Greater Sydney
# 2    North Sydney
print(filtered)

edited Aug 28 '20 at 06:50

answered Aug 28 '20 at 06:34

Aviv Yaniv

6,188
3
7
22

This is what I thought, however when I run the script using this it returns the following error: ValueError: Cannot mask with non-boolean array containing NA / NaN values – TC1111 Aug 28 '20 at 07:00
@TC1111 Could you add some example input, please? Anyway [`dropna`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html) and/or [`fillna`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html) may be useful – Aviv Yaniv Aug 28 '20 at 07:13

score 1 · Answer 2 · answered Aug 28 '20 at 06:44

str.contains is good way to go, but if your input will be 'North Sydney' you will not receive Sydney as a result , only north sydney example:

df= pd.DataFrame({
    'A':['Sydney','North Sydney','Alaska']

})
print(df)
              A
0        Sydney
1  North Sydney
2        Alaska
input='North Sydney'
filtered = df[df.A.str.contains(input)]

print(filtered)
              A
1  North Sydney

So to improve that way use split() with str.contains()

input=input.split()
print(input)
['North', 'Sydney']

filtered = df[df.A.str.contains('%s'%[x for x in input])]

print(filtered)
              A
0        Sydney
1  North Sydney

So in this way , you are sure that all parts of your input will be takeen into account

@Aviv Yaniv its searching if your series contains , each part of your input. for this example `Sydney` and `North` — sygneto, Aug 28 '20 at 06:59

Filtering for 'contains value' in a dataset in Python

2 Answers2