1

Novice Python user here.

I have a dataframe imported from a csv file which I need to search for "Alert" and "Amber" keywords from the from_data column (searching for upper, lower or a combination of both case).

Here are the contents of my dataframe called df:

  Id_No      from_data 
  1          Alert g12134 CONFIRMATION CODE A27 
  1          ALERT g12134 CONFIRMATION CODE A28
  5          g12136  CONFIRMATION CODE B01 - RED
  5          g12136  CONFIRMATION CODE B02 - AMBER
  6          g12136  CONFIRMATION CODE B01 - RED
  6          g12136  CONFIRMATION CODE B02 - AMBER
  9          Alert g12134  CONFIRMATION CODE A27
  15         **ERROR** no alert was registered
  17         g12136  CONFIRMATION CODE B02 - AMBER
  19         g12136  CONFIRMATION CODE B03 - GREEN

Here is what I would like (in a new dataframe):

 id_no  from_data
 1      Alert g12134  CONFIRMATION CODE A27
 1      ALERT g12134  CONFIRMATION CODE A28
 9      Alert g12134  CONFIRMATION CODE A27
 5      g12136 CONFIRMATION CODE B02 - AMBER
 15     **ERROR** no alert was registered 
 17     g12136  CONFIRMATION CODE B02 - AMBER

I've been searching the net all day and reading a lot of fuzzywuzzy articles but I can't seem to get any code to give me the results i want.

Any help to provide a solution would be greatly appreciated (and stop me from going mad!!)

Thanks

Big_Daz
  • 141
  • 1
  • 7
  • What have you tried? What `fuzzywuzzy` methods are you using? – Ian Thompson Feb 26 '19 at 17:49
  • Hi, and welcome to StackOverflow! Could you perhaps specify the logic to go from input to output? Why has the order changed? And what happened to id 6 containing AMBER? Why would you consider fuzzy matching? – Jondiedoop Feb 26 '19 at 17:49
  • Hi Ian, I searched around the net and found a few examples which I couldn't get to work such as a function which used fuzz.token_sort_ratio which I understand iterated over the records to filter my dataset using a keyword but I kept ending up with an empty dataset – Big_Daz Feb 27 '19 at 09:09
  • Hi Jondiedoop, Apologies, it was my rubbish attempt to mock up the results and was only a subset of the large dataset I have to search through which may have mis spellings so I wanted to use fuzzywuzzy to capture high ratio matches – Big_Daz Feb 27 '19 at 09:44

1 Answers1

3

Use boolean indexing with str.contains()

df[df['from_data'].str.lower().str.contains('alert|amber')]

   Id_No                              from_data
0      1     Alert g12134 CONFIRMATION CODE A27
1      1     ALERT g12134 CONFIRMATION CODE A28
3      5  g12136  CONFIRMATION CODE B02 - AMBER
5      6  g12136  CONFIRMATION CODE B02 - AMBER
6      9    Alert g12134  CONFIRMATION CODE A27
7     15      **ERROR** no alert was registered
8     17  g12136  CONFIRMATION CODE B02 - AMBER
It_is_Chris
  • 13,504
  • 2
  • 23
  • 41