104

I'm looking for recommendations of a good, free tool for generating sample data for the purpose of loading into test databases. By analogy, something that produces "lorem ipsum" text for any RDBMS. Features I'm looking for include:

  • Flexibility to generate data for an existing table definition.
  • Ability to generate small and large data sets (> 1 million rows or more).
  • Generate in SQL script format (INSERT statements) or else in a flat file format suitable for bulk import (which is usually faster).
  • A command-line interface for easy scripting.
  • Extensible, open source, written in a dynamic language (these are nice-to-haves, not strong requirements).

PS: I did search for a duplicate question on StackOverflow, but I didn't find one. If there is one, I'll be grateful to get a pointer to it.


Thanks for the great responses everyone! I should amend my requirements that I use Mac OS X as my primary development environment, not Windows (though I did say command-line interface is desirable, and that practically rules out Windows). The Windows-specific suggestions will no doubt be useful to other readers of this question, though, so thanks.


Here is my conclusion:

  • GenerateData:
    • PHP web app interface, not command line
    • limited to generating 200 records (or pay $20 for license to generating 5,000 records)
  • RedGate SQL Data Generator
    • not free, price $295
    • requires Windows, .NET, SQL Server
  • Visual Studio 2008 Database Edition
    • requires Windows
    • requires costly MSDN or ISV subscription
  • Banner Datadect
    • not free, price $595
    • requires Windows (?)
    • no support for MySQL (?)
    • GUI, not command line or scriptable
  • Ruby Faker gem
    • way too slow to use ActiveRecord for bulk data load
  • Super Smack
    • chiefly a load-testing tool, with a random data generator built in
    • pretty simple to use nevertheless
    • overall a good runner-up tool
  • Databene Benerator
    • best solution for my needs
    • XML scripts, compatible with DbUnit
    • open source (GPL) Java code
    • command-line usage
    • access many databases directly via JDBC
Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828

16 Answers16

41

Take a look at databene benerator, a test data generator that looks close to your requirements.

  • it can generate data for an existing table definition (or even anonymize production data)
  • it can generate larges data set (unlimited size)
  • it supports various input (CSV, Flat Files, DBUnit) and output format (CSV, Flat Files, DBUnit, XML, Excel, Scripts)
  • it can be used on the command line or through a maven plugin
  • it's open source and customizable

I would give it a try.

BTW, a list of similar products is available on databene benerator's web site.

Pascal Thivent
  • 562,542
  • 136
  • 1,062
  • 1,124
  • anyone had success with it's usage? I tried the it, but benerator-wizard generates invalid pom.xml file (for "Populate database" option). Moreover going for one of the demos (hsqldb) with maven results in errors as well. To me it seems like the tool not in a good shape, thus not worth loosing the time with it. – Peter Butkovic Nov 07 '13 at 09:11
23

This looks quite promising: generatedata.com. Open-source, has lots of built-in data types.

There are several others listed here: Test (Sample) Data Generators. I don't have experience with any of them, but a few on that list look like they could be pretty decent.

Chad Birch
  • 73,098
  • 23
  • 151
  • 149
6

Try http://www.mockaroo.com

This is a tool my company made to help test our own applications. We've made it free for anyone to use. It's basically the Forgery ruby gem with a web app wrapped around it. You can generate data in CSV, txt, or SQL formats. Hope this helps.

mockaroodev
  • 2,031
  • 1
  • 20
  • 24
5

I know you said you were looking for a free tool, but this is one case where I would suggest that spending $295 will pay you back quickly in time saved. I've been using the RedGate tool SQL Data Generator for the last year and it is, to be short, an awesome tool. It allows for setting dependencies between columns, generates realistic data for business objects such as phone numbers, urls, names, etc. I can honestly state that this tool has paid for itself time and time again.

KevDog
  • 5,763
  • 9
  • 42
  • 73
2

a tool that really should not be missing from the list is the Data Generator from Datanamic that populates databases directly or generates insert scripts, has a large collection of pre-installed generators ( and supports multiple databases...

http://www.datanamic.com/datagenerator/index.html

2

If you are looking or willing to use something MySQL-specific, you could take a look at Super Smack. It is currently maintained by Tony Bourke.

Super Smack allows you to generate random data to insert into your database tables. It is customizable, allowing you to use the packaged words.dat file, or any test data of your choice.

One of the nice things about it is that it is command-line is highly customizable. There is some fairly decent examples of usage in the book High Performance MySQL which is also excerpted here.

Not sure if that is along the lines of what you are looking for, but just a thought.

jonstjohn
  • 59,650
  • 8
  • 43
  • 55
2

A Ruby script with one of the available fake data generators should do you just fine.

http://faker.rubyforge.org/ is one such gem. Unfortunately, this doesn't fulfill all your requirements.

Here is another: http://random-data.rubyforge.org/

And a tutorial for using Faker: http://www.rubyandhow.com/how-to-generate-fake-names-addresses-in-ruby/


RE: Flexibility to generate data for an existing table definition. Combine the Faker gem with one of the available ORMs. ActiveRecord would probably be easiest.

brendanjerwin
  • 1,381
  • 2
  • 11
  • 25
  • Have you tried to do a bulk load of > 1 million rows, one row at a time through an ActiveRecord interface? I am not optimistic about time to completion. – Bill Karwin Mar 04 '09 at 21:44
  • Also, I used the Faker gem today in some Cucumber Feature steps and its S L O W. So, my score so far: ActiveRecord -1; Faker -1 I'm not doing so great. :) – brendanjerwin Mar 05 '09 at 02:37
2

Normally very costly, but if you are a small ISV you can get Visual Studio 2008 Database Edition very cheaply, see the empower and bizspark promotions. It provides a lot more functionality then just generating test data (Integration with SCC, Unit Testing, DB Refactoring, etc.)

As I like the fact that Red-Grate tools are so easy to learn, I would still look at SQL Data Generator

Ian Ringrose
  • 51,220
  • 55
  • 213
  • 317
  • Yeah it's less costly, on the order of the same price as RedGate's tool, but in addition you have to qualify as an ISV and that means buying other stuff. Thanks for the link anyway, no doubt it'll be useful for someone. +1 – Bill Karwin Mar 07 '09 at 18:41
1

Here is the list of such tools (both free and commercial): http://c2.com/cgi/wiki?TestDataGenerator

IgorJ
  • 418
  • 4
  • 8
1

For OS X there is Data Creator (US $ 7). Download is free for test purpose. You can use it to evaluate the software and its features.

It requires OS X Lion or successive. It can generate a lot of different field type and has a custom export mode plus some pre-set (TSV, CSV, Html table, web page with table inside).

http://www.tensionsoftware.com/osx/datacreator/

here at the App Store:

https://itunes.apple.com/us/app/data-creator/id491686136?mt=12

RPT
  • 131
  • 1
  • 3
1

You can use DbSchema, www.dbschema.com it's a database management tool and it has a Random Data Generator to populate your database.

1

I know you're not looking for actual lorem ipsum text; but in case anyone else searches for an actual lorem ipsum generator and finds this thread: lipsum.com does a great job of it.

Jenn D.
  • 1,045
  • 4
  • 13
  • 22
1

Not free, but Visual Studio 2008 Database Edition is a good alternative and it provides a lot more functionality (Integration with SCC, Unit Testing, DB Refactoring, etc...)

bastos.sergio
  • 6,684
  • 4
  • 26
  • 36
  • Seems to be available only through an MSDN subscription that costs $5469 per year. For that amount of money, I could hire some college students to make up test data and type it in. – Bill Karwin Mar 04 '09 at 21:48
1

I use a tool called Datatect:

  1. Generates data to flat files or any ODBC compliant database.
  2. Extensible via VBScript.
  3. Referentially aware; will populate foreign keys with values from parent table.
  4. Data is context aware; city, state and phone numbers for given zip codes, first names and titles with gender.
  5. Can create custom, complex data types.
  6. Generate over 2 billion proper names, business names, street addresses, cities, states, and zip codes.

I've used this tool to generate as many as 40,000,000 rows of data to a SQLServer database, and 8,000,000 rows of data to an Oracle database.

I am in no way affiliated with Banner Systems, just a satisfied customer.

Patrick Cuff
  • 28,540
  • 12
  • 67
  • 94
  • That looks like a promising option. Thanks for the link. +1 However, I don't develop on Windows as my primary platform, sorry I didn't specify that in my question. – Bill Karwin Mar 07 '09 at 18:43
0

+1 for Benerator: I tried 3 or 4 of the other tools on offer (including dbmonster) but found Benerator to be very quick, to deliver realistic data and to be flexible. I also got very quick & helpful feedback from the tool's creator when I posted on the forum.

davek
  • 22,499
  • 9
  • 75
  • 95
0

Not direct answer to your question but this can be helpful for certain kind of data :

Fake Name Generator can be useful - http://www.fakenamegenerator.com/ , not for everything but user accounts or stuff like that. AFAIK They provide support for bulk order.

dr. evil
  • 26,944
  • 33
  • 131
  • 201