1

Can someone help to validate a column. It's about a 'Initials' column.

Contact Initials
1 P.J.
2 Peter
3 P.

An Initial exist of one letter and then a point. So like rows one and three. Row 2 is false.

I hope someone can help.

Monkey D
  • 69
  • 4

2 Answers2

1

Use Series.str.contains for test uppercase with dot:

print (df)
   Contact Initials
0        1     P.J.
1        2    Peter
2        3   P.Daa.
3        4       P.
4        5      H..
5        6      J.K

#https://stackoverflow.com/a/17779796/2901002 with ^ for start and $ for end of string
df['test'] = df['Initials'].str.contains(r'^(?:[A-Z]\.)+$')
print (df)
   Contact Initials   test
0        1     P.J.   True
1        2    Peter  False
2        3   P.Daa.  False
3        4       P.   True
4        5      H..  False
5        6      J.K  False
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • For example Initials H.. and J.K gives True as value. But they are False because H.. must have one dot and J.K has no dot at the end. Do you know how i can validate please? – Monkey D Jul 12 '21 at 18:24
  • Thanks!!! I don't know if i can keep asking question. But the challenge i now have is to make it clean. So everything needs to be like row one. Row 2 don't have to be like P. because thats only possible with machine learning i guess. – Monkey D Jul 13 '21 at 07:39
  • @MonkeyD - I think not so easy, best create new question. – jezrael Jul 13 '21 at 07:40
0

You could consider writing a custom function to check for your two conditions:

def validate(string):
    if not string[0].isalpha():
        return False
    if not string[1] == ".":
        return False
    return True

Then apply it to the columns like so:

>>> df["Initials"].apply(validate)
0     True
1    False
2     True
not_speshal
  • 22,093
  • 2
  • 15
  • 30