2

How to generate random 20 digit UID(Unique Id) in python. I want to generate UID for each row in my data frame. It should be exactly 20 digits and should be unique.

I am using uuid4() but it generates 32 digits UID, would it be okay to slice it [:21]? I don't want the id to repeat in the future.

Any suggestions would be appreciated!

user78910
  • 349
  • 2
  • 12
Sanchit
  • 131
  • 3
  • 10
  • Both strings and integers work in uid – Sanchit May 18 '20 at 09:18
  • What is the current unique-key in your dataset? Worth using that to generate the unique ids based on some logic. – Anshul May 18 '20 at 09:35
  • No unique key as of now , that's why I intend on using uuid library! – Sanchit May 18 '20 at 09:36
  • 1
    So how do you identify unique records as of now? There has to be some unique-key (even if its composite), isnt it? Or I am missing something. – Anshul May 18 '20 at 09:40
  • I mean I am having hard time relating to a dataset for which there can be duplicate data in all the columns accept a unique id which we are trying to create at random :) – Anshul May 18 '20 at 09:42
  • IIUC, the 'uniqueness' wont come if you use methods from 'random' module. – Anshul May 18 '20 at 09:44
  • you could concat each row to create a composite key, not the best scenario but you should show a sample of your data for better help. – Umar.H May 18 '20 at 10:00

2 Answers2

1

I'm definately no expert in Python nor Pandas, but puzzled the following together. You might find something usefull:


First I tried to use Numpy but I hit the max of upper limit:

import pandas as pd
import numpy as np
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'], 'ID':[0,0,0,0]}
df = pd.DataFrame(data)  
df.ID = np.random.randint(0, 9223372036854775807, len(df.index), np.int64)
df.ID = df.ID.map('{:020d}'.format)
print(df)

Results:

    Name                    ID
0    Tom  03486834039218164118
1   Jack  04374010880686283851
2  Steve  05353371839474377629
3  Ricky  01988404799025990141

So then I tried a custom function and applied that:

import pandas as pd
import random
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'], 'ID':[0,0,0,0]}
df = pd.DataFrame(data)

def UniqueID():
    UID = '{:020d}'.format(random.randint(0,99999999999999999999))
    while UniqueID in df.ID.unique():
        UID = '{:020d}'.format(random.randint(0,99999999999999999999))
    return UID

df.ID = df.apply(lambda row: UniqueID(), axis = 1)
print(df)

Returns:

    Name                    ID
0    Tom  46160813285603309146
1   Jack  88701982214887715400
2  Steve  50846419997696757412
3  Ricky  00786618836449823720
JvdV
  • 70,606
  • 8
  • 39
  • 70
0

I think uuid4() in python works, just slice it accordingly

Sanchit
  • 131
  • 3
  • 10