0

I am using pandas and i get two files xlsx

There is no ID columns in the files

So i need to create ID generator:

df
###
  Employee type First Name Last Name Date of Birth
0      Employee      Paulo    Cortez      01-01-90
1      Employee      Paulo    Cortez      01-01-90
2      Employee      Paulo    Cortez      01-01-90
3      Employee      Paulo       NaN      01-01-90
4      Employee      Maria     Silva      02-01-90
5      Employee        NaN     Silva      04-10-90
6      Employee       Joao   Augusto      12-11-89

I need any library or function in python can make unique id from read First name ... Last name.... Date birth

Name. Lastname.  Date.  Idcreate
    Amir  loka   18/07/1990   1288749
    Jack.  Broo.  17/09/1988.  128389
    Amir  loka   18/07/1990   1288749

If the function get same name last name date the genrate same code

Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
Cogp39
  • 1
  • 2
  • Does this answer your question? [In Pandas, how to create a unique ID based on the combination of many columns?](https://stackoverflow.com/questions/36646923/in-pandas-how-to-create-a-unique-id-based-on-the-combination-of-many-columns) – AlexK Oct 15 '22 at 22:23

1 Answers1

0

I believe you're looking for pd.groupby().ngroup().

import pandas as pd
import numpy as np

df = pd.DataFrame({'Employee': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6}, 'type': {0: 'Employee', 1: 'Employee', 2: 'Employee', 3: 'Employee', 4: 'Employee', 5: 'Employee', 6: 'Employee'}, 'First Name': {0: 'Paulo', 1: 'Paulo', 2: 'Paulo', 3: 'Paulo', 4: 'Maria', 5: nan, 6: 'Joao'}, 'Last Name': {0: 'Cortez', 1: 'Cortez', 2: 'Cortez', 3: nan, 4: 'Silva', 5: 'Silva', 6: 'Augusto'}, 'Date of Birth': {0: '01-01-90', 1: '01-01-90', 2: '01-01-90', 3: '01-01-90', 4: '02-01-90', 5: '04-10-90', 6: '12-11-89'}})

#creates unique ID (the -1 values that result from employees with no last name or first name are removed)
df['unique_ID'] = df.groupby(['First Name', 'Last Name', 'Date of Birth']).ngroup().replace(-1, None)

#output
   Employee      type First Name Last Name Date of Birth unique_ID
0         0  Employee      Paulo    Cortez      01-01-90         2
1         1  Employee      Paulo    Cortez      01-01-90         2
2         2  Employee      Paulo    Cortez      01-01-90         2
3         3  Employee      Paulo       NaN      01-01-90      None
4         4  Employee      Maria     Silva      02-01-90         1
5         5  Employee        NaN     Silva      04-10-90      None
6         6  Employee       Joao   Augusto      12-11-89         0
amance
  • 883
  • 4
  • 14
  • Its work But if i have another file xlsx but not sorted the same sorte is giving the same unique id?? – Cogp39 Oct 17 '22 at 17:34
  • @Cogp39 This would only work per dataset. You could try concatenating datasets together into one before running ngroup(). – amance Oct 17 '22 at 17:39