main_df:
Name Age Id DOB
0 Tom 20 A4565 22-07-1993
1 nick 21 G4562 11-09-1996
2 krish AKL F4561 15-03-1997
3 636A 18 L5624 06-07-1995
4 mak 20 K5465 03-09-1997
5 nits 55 56541 45aBc
6 444 66 NIT 09031992
column_info_df:
Column_Name Column_Type
0 Name string
1 Age integer
2 Id string
3 DOB Date
how can i find data type error value from main df. For example from column info df we can see 'Name' is a string column, so in main df, 'Name' column should contain either string or alphanumeric other than that it's an error. I need to find those datatype error values in a separate df.
error output df:
Column_Name Current_Value Exp_Dtype Index_No.
0 Name 444 string 6
1 Age 444 int 2
2 Name 56441 string 6
0 DOB 4aBc Date 5
0 DOB 09031992 Date 6
i tried this:
for i,r in column_info_df.iterrows():
if r['Column_Type'] == 'string':
main_df[r['Column_Name']].loc[main_df[r['Column_Name']].str.match(r'[^a-z|A-Z]+')]
elif r['Column_Type'] == 'integer':
main_df[r['Column_Name']].loc[main_df[r['Column_Name']].str.match(r'[^0-9]+')]
elif r['Column_Type'] == 'Date':
i have stuck here,because this RE is not catching every errors. i don't know how to go further?