3

In Python how do you filter a column by values that contain a particular value?

An example is a data set with a column called 'City' and the values could be 'Sydney', 'Greater Sydney', 'North Sydney' etc. If a use input 'Sydney' how can I ensure that all variations are included in the filtering?

#user inputs column
input1 = input()
country_city = input1.title()

#user inputs value
input2 = input()
country_city_value = input2.title()

#filtering step (current)
filtered = dataset[dataset[country_city] == country_city_value]
print(filtered)

TC1111
  • 89
  • 8

2 Answers2

1

If you want to filter contents that contain the input word, apply the str.contains:

data = {'City': ['Sydney', 'Greater Sydney', 'North Sydney'],}

dataset = pd.DataFrame(data, columns = ['City'])

#user inputs column
input1 = input()              
country_city = input1.title()        # 'City'

#user inputs value
input2 = input()
country_city_value = input2.title()  # 'Sydney'

#filtering step (current)
filtered = dataset[dataset[country_city].str.contains(country_city_value)]

#              City
# 0          Sydney
# 1  Greater Sydney
# 2    North Sydney
print(filtered)
Aviv Yaniv
  • 6,188
  • 3
  • 7
  • 22
  • This is what I thought, however when I run the script using this it returns the following error: ValueError: Cannot mask with non-boolean array containing NA / NaN values – TC1111 Aug 28 '20 at 07:00
  • @TC1111 Could you add some example input, please? Anyway [`dropna`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html) and/or [`fillna`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html) may be useful – Aviv Yaniv Aug 28 '20 at 07:13
1

str.contains is good way to go, but if your input will be 'North Sydney' you will not receive Sydney as a result , only north sydney example:

df= pd.DataFrame({
    'A':['Sydney','North Sydney','Alaska']

})
print(df)
              A
0        Sydney
1  North Sydney
2        Alaska
input='North Sydney'
filtered = df[df.A.str.contains(input)]

print(filtered)
              A
1  North Sydney

So to improve that way use split() with str.contains()

input=input.split()
print(input)
['North', 'Sydney']

filtered = df[df.A.str.contains('%s'%[x for x in input])]

print(filtered)
              A
0        Sydney
1  North Sydney

So in this way , you are sure that all parts of your input will be takeen into account

sygneto
  • 1,761
  • 1
  • 13
  • 26
  • @Aviv Yaniv its searching if your series contains , each part of your input. for this example `Sydney` and `North` – sygneto Aug 28 '20 at 06:59