Comparing two columns and filtering columns with neighboring classes

Question

So here the classes are from "eight-twenty". The numbers are written in characters....I got a table where predictions are not equal to actual value when the classifier predicts the class. Now I want a table where the classifier misses the class by one neighbouring class. For example from table above I want only columns

    predictions   actual
8013  fifteen     sixteen
5146  sixteen     seventeen
5691  seventeen   sixteen
13255 sixteen     fifteen
13921 thirteen    fourteen
13077 fourteen    fifteen

score 2 · Accepted Answer · answered Mar 05 '20 at 10:28

Use boolean indexing with converting both columns to numeric and filtering added 1 and subtracted 1 from actual column chained | for bitwise OR, Series.eq is used for check if equal values:

print (df)
      predictions     actual
8013      fifteen     twenty
5146      sixteen  seventeen
5691    seventeen    sixteen
13255     sixteen    fifteen
13921    nineteen   fourteen
13077    fourteen    fifteen

#https://stackoverflow.com/a/493788/2901002
def text2int(textnum, numwords={}):
    if not numwords:
      units = [
        "zero", "one", "two", "three", "four", "five", "six", "seven", "eight",
        "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen",
        "sixteen", "seventeen", "eighteen", "nineteen",
      ]

      tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"]

      scales = ["hundred", "thousand", "million", "billion", "trillion"]

      numwords["and"] = (1, 0)
      for idx, word in enumerate(units):    numwords[word] = (1, idx)
      for idx, word in enumerate(tens):     numwords[word] = (1, idx * 10)
      for idx, word in enumerate(scales):   numwords[word] = (10 ** (idx * 3 or 2), 0)

    current = result = 0
    for word in textnum.split():
        if word not in numwords:
          raise Exception("Illegal word: " + word)

        scale, increment = numwords[word]
        current = current * scale + increment
        if scale > 100:
            result += current
            current = 0

    return result + current

p = df['predictions'].apply(text2int) 
a = df['actual'].apply(text2int) 

df1 = df[p.eq(a+1) | p.eq(a-1)]

Or:

df1 = df[(p == a+1) | (p == a-1)]

print (df1)
      predictions     actual
5146      sixteen  seventeen
5691    seventeen    sixteen
13255     sixteen    fifteen
13077    fourteen    fifteen

ThomaS · Answer 2 · 2020-03-05T10:59:58.583

1

you could change your number writing in string to int with the code at: Is there a way to convert number words to Integers?

Or if you have limited range may be do it by hand with two dictionaries like

prev_dict = {'sixteen':'fifteen', 'seventeen'}
next_dict = {'sixteen':'seventeen'}

and then:

predict[(predict['prediction'] == predict['actual'].map(prev_dict)) | (predict['prediction'] == predict['actual'].map(next_dict))]

edited Mar 05 '20 at 10:59

answered Mar 05 '20 at 10:04

ThomaS

815
4
13

Great! This definitely answered my question in one way. But can I get code how can I achieve this if I convert words to integers and I should get result from that point – Chinti Mar 05 '20 at 10:13
Also when I ran that code I am getting error 'Series' objects are mutable, thus they cannot be hashed. Why is that?Thank you in advance – Chinti Mar 05 '20 at 10:27
1

I was doing it wrong, only suceed with two dictionaries, the error was due to that the dictionary didn't recogize the keys and so tried to add the Series as a key. I edited my answer, jezrael answer is better though – ThomaS Mar 05 '20 at 11:00

Comparing two columns and filtering columns with neighboring classes

2 Answers2