-3

I have some customer data with me -

Name |  Age |  Gender |  Phone Number | Email Id |

abc. |  25  |  M.     | 234 567 890   | example.com|

There are 60k rows of data like this and multiple tables. How can I make synthetic data for this dataset using python ?

I have no knowledge about this. Any suggestions would be helpful. Thanks!

Seeker90
  • 785
  • 4
  • 17
  • 37
  • Is there any reason you need to do it in Python? There are plenty of websites that will do this sort of thing for you; https://mockaroo.com/, https://www.generatedata.com/, etc. – psidex Sep 01 '19 at 18:38

1 Answers1

2

Pyhton faker is your friend here. It can generate locaclized fake data for names, addresses, phone and credit card numbers and many more.

from faker import Faker
fake = Faker()
n = 1000
df = pd.DataFrame([[fake.name(),
        np.random.randint(19,91),
        np.random.choice(['M.', 'F.']),
        fake.phone_number(),
        fake.email()] for _ in range(n)],
        columns=['Name', 'Age', 'Gender', 'Phone number', 'Email ID'])

Output of df.head():

                 Name  Age Gender        Phone number                      Email ID
0      Miranda Hinton   21     F.        018.482.1404            meghan91@lopez.biz
1      Donald Donovan   51     F.    572.846.4120x995        jacobcarson@melton.com
2      Shannon Grimes   72     F.          0289879995           phillip93@gmail.com
3       Heather Perez   87     F.        012-033-2318  rodriguezjeffrey@hotmail.com
4  Jacqueline Pearson   22     M.  178-913-4566x89793        brianclark@hotmail.com
Stef
  • 28,728
  • 2
  • 24
  • 52