Create unique ID in table have name and date brith

Question

I am using pandas and i get two files xlsx

There is no ID columns in the files

So i need to create ID generator:

df
###
  Employee type First Name Last Name Date of Birth
0      Employee      Paulo    Cortez      01-01-90
1      Employee      Paulo    Cortez      01-01-90
2      Employee      Paulo    Cortez      01-01-90
3      Employee      Paulo       NaN      01-01-90
4      Employee      Maria     Silva      02-01-90
5      Employee        NaN     Silva      04-10-90
6      Employee       Joao   Augusto      12-11-89

I need any library or function in python can make unique id from read First name ... Last name.... Date birth

Name. Lastname.  Date.  Idcreate
    Amir  loka   18/07/1990   1288749
    Jack.  Broo.  17/09/1988.  128389
    Amir  loka   18/07/1990   1288749

If the function get same name last name date the genrate same code

Does this answer your question? [In Pandas, how to create a unique ID based on the combination of many columns?](https://stackoverflow.com/questions/36646923/in-pandas-how-to-create-a-unique-id-based-on-the-combination-of-many-columns) — AlexK, Oct 15 '22 at 22:23

score 0 · Accepted Answer · answered Oct 16 '22 at 03:40

I believe you're looking for pd.groupby().ngroup().

import pandas as pd
import numpy as np

df = pd.DataFrame({'Employee': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6}, 'type': {0: 'Employee', 1: 'Employee', 2: 'Employee', 3: 'Employee', 4: 'Employee', 5: 'Employee', 6: 'Employee'}, 'First Name': {0: 'Paulo', 1: 'Paulo', 2: 'Paulo', 3: 'Paulo', 4: 'Maria', 5: nan, 6: 'Joao'}, 'Last Name': {0: 'Cortez', 1: 'Cortez', 2: 'Cortez', 3: nan, 4: 'Silva', 5: 'Silva', 6: 'Augusto'}, 'Date of Birth': {0: '01-01-90', 1: '01-01-90', 2: '01-01-90', 3: '01-01-90', 4: '02-01-90', 5: '04-10-90', 6: '12-11-89'}})

#creates unique ID (the -1 values that result from employees with no last name or first name are removed)
df['unique_ID'] = df.groupby(['First Name', 'Last Name', 'Date of Birth']).ngroup().replace(-1, None)

#output
   Employee      type First Name Last Name Date of Birth unique_ID
0         0  Employee      Paulo    Cortez      01-01-90         2
1         1  Employee      Paulo    Cortez      01-01-90         2
2         2  Employee      Paulo    Cortez      01-01-90         2
3         3  Employee      Paulo       NaN      01-01-90      None
4         4  Employee      Maria     Silva      02-01-90         1
5         5  Employee        NaN     Silva      04-10-90      None
6         6  Employee       Joao   Augusto      12-11-89         0

Its work But if i have another file xlsx but not sorted the same sorte is giving the same unique id?? — Cogp39, Oct 17 '22 at 17:34
@Cogp39 This would only work per dataset. You could try concatenating datasets together into one before running ngroup(). — amance, Oct 17 '22 at 17:39

Create unique ID in table have name and date brith

1 Answers1