8

I need to generate 1000 unique first names and store them in a list. I am using Python faker but getting so many repeated values.

import random
from random import shuffle
from faker import Faker

fake = Faker()
fake.random.seed(4321)  

first_n=[]
for i in range(1000):
    name=fake.first_name()
    if name in first_n:
        first_n.append("Repeat")
    else:
        first_n.append(name)

Thanks for your help.

Alec
  • 8,529
  • 8
  • 37
  • 63
Abhi
  • 91
  • 1
  • 1
  • 5

4 Answers4

11

Instead of generating random values and then checking if they're unique, retrieve the list of the names stored in the provider you're using, shuffle them and return the first 1000. That way you won't run into collisions at all. There's about 7k first names defined in the en provider, while the other languages may have far less - making collisions rather certain as you're getting further into the sequence.

from random import shuffle, seed
from faker.providers.person.en import Provider

first_names = list(set(Provider.first_names))

seed(4321)
shuffle(first_names)

print(first_names[0:1000])
MatsLindh
  • 49,529
  • 4
  • 53
  • 84
  • Thanks, But every time I am getting different values. I need the same values on every run. – Abhi Apr 24 '19 at 19:11
  • In that case seed the random number generator before calling shuffle with the same seed each time as in your own example. I've updated the code in my answer to use seed as well. – MatsLindh Apr 24 '19 at 19:22
  • from random import shuffle, seed from faker.providers.person.en import Provider first_names = list(Provider.first_names) seed(4321) shuffle(first_names) li=(first_names[0:1000]) print(len(li)) new=[] for n in li: if n in new: new.append("repeat") else: new.append(n) print(sorted(new)) Getting the Repeated value at last of the list. "'Zoey', 'Zollie', 'Zora', 'repeat', 'repeat', 'repeat', 'repeat', 'repeat']" – Abhi Apr 24 '19 at 19:34
  • Ah, that can be caused by there being duplicates across male and female names. You can create a set before extracting the unique names as a list. I've updated the code again. – MatsLindh Apr 24 '19 at 20:18
  • Can I use the same for Last_name? – Abhi Apr 24 '19 at 20:42
  • Should work; check the Provider class you're using on github to see exactly what is available. – MatsLindh Apr 24 '19 at 21:51
5

Since version 4.9.0, python's Faker library has built-in functionality for supporting unique values. See the relevant section of the README.

In essence, one can now do:

from faker import Faker
faker = Faker()
names = [faker.unique.first_name() for _ in range(500)]
assert len(set(names)) == len(names)
dedoussis
  • 390
  • 3
  • 6
3

Youll want to store data in a set instead of a list to prevent duplicates

Then you can use a while loop

first_n = set()
while len(first_n) < 1000:
    first_n.add(fake.first_name())
jose_bacoy
  • 12,227
  • 1
  • 20
  • 38
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
0

You can use a set, and keep adding until the length is what you desire:

names = set()
while len(names) < LENGTH:
    names.add(NAME)

You can then convert back to a list:

names = list(names)
Alec
  • 8,529
  • 8
  • 37
  • 63