3

Please bear with me on this question.

I'm looking to create a relatively large MySQL database that I want to use to do some performance testing. I'm using Ubuntu 11.04 by the way.

I want to create about 6 tables, each with about 50 million records. Each table will have about 10 columns. The data would just be random data.

However, I'm not sure how I can go about doing this. Do I use PHP and loop INSERT queries (bound to timeout)? Or if that is inefficient, is there a way I can do this via some command line utility or shell script?

I'd really appreciate some guidance.

Thanks in advance.

ObiHill
  • 11,448
  • 20
  • 86
  • 135
  • 1
    Check out some of the tools mentioned in http://stackoverflow.com/questions/591892/tools-for-generating-mock-data – Bill Karwin Sep 09 '11 at 21:37
  • 1
    @Bill: I found some great tools here: [http://www.webresourcesdepot.com/test-sample-data-generators/](http://www.webresourcesdepot.com/test-sample-data-generators/). It turns out spawner was just the kind of tool I was looking for. Now if only I can find the same kind of tool I can use on Ubuntu Command Line. – ObiHill Sep 11 '11 at 19:25

8 Answers8

6

mysql_import is what you want. Check this for full information. It's command line and very fast.

beetree
  • 871
  • 2
  • 8
  • 18
  • I don't exactly have my data ready to go in a file. Is this something I can script to insert random data sets multiple times in sequence?! – ObiHill Sep 09 '11 at 22:30
  • You could generate text files through your script then use mysqlimport, unless the parameters that mysqlimport isn't enough. – beetree Sep 09 '11 at 22:51
  • Ok. I would imagine a text file with 50 million INSERTs would be quite large though. Not sure how long it would take a script to spit that data out, and also how long mysqlimport would need to load it all in. I guess it's worth a try though. – ObiHill Sep 10 '11 at 20:41
  • mysqlimport is a much more viable option than writing a PHP script to insert each query. It's built for bulk loading of data. – user183037 Sep 18 '11 at 21:19
  • but if you stick to using a webserver 'mysqldumper' is a good solution. Helped me to restore a big table full of blobs. – ppuschmann Sep 19 '11 at 20:20
5

Command-line mode usually has the timeouts disabled, as that's a protection against taking down a webserver, which doesn't apply at the command line.

You can do it from PHP, though generating "random" data will be costly. How random does this information have to be? You can easily read from /dev/random and get "garbage", but it's not a source of "good" randomness (You'd want /dev/urandom, then, but that will block if there isn't enough entropy available to make good garbage).

Just make sure that you have keys disabled on the tables, as keeping those up-to-date will be a major drag on your insert operations. You can add/enable the keys AFTER you've got your data set populated.

Marc B
  • 356,200
  • 43
  • 426
  • 500
  • Thanks for the tip. I'm thinking command line on linux might be the way to go. However, I'm not a shell script maven so I'm a little lost as regards where to start to put some kind of script together. I don't really need the data to be completely random, I just need maybe 100 rows random i.e. insert 100 rows of random data, then insert the same 100 rows again, and again, etc. Any ideas on this?! – ObiHill Sep 09 '11 at 22:34
1

If you do want to go the php way, you could do something like this:

<?php
//Edit Following
$millionsOfRows = 2;
$InsertBatchSize = 1000;
$table = 'ATable';
$RandStrLength = 10;
$timeOut = 0; //set 0 for no timeout
$columns = array('col1','col2','etc');

//Mysql Settings
$username = "root";
$password = "";
$database = "ADatabase";
$server   = "localhost";

//Don't edit below

$letters = range('a','z');
$rows = $millionsOfRows * 1000000;
$colCount = count($columns);
$valueArray = array();

$con = @mysql_connect($server, $username, $password) or die('Error accessing database: '.mysql_error());
@mysql_select_db($database) or die ('Couldn\'t connect to database: '.mysql_error());

set_time_limit($timeOut);

for ($i = 0;$i<$rows;$i++)
{
    $values = array();
    for ($k = 0; $k<$colCount;$k++)
        $values[] = RandomString();
    $valueArray[] = "('".implode("', '", $values)."')";

    if ($i > 0 && ($i % $InsertBatchSize) == 0)
    {   
        echo "--".$i/$InsertBatchSize."--";
        $sql = "INSERT INTO `$table` (`".implode('`,`',$columns)."`) VALUES ".implode(',',$valueArray);
        mysql_query($sql);
        echo $sql."<BR/><BR/>";
        $valueArray = array();
    }
}
mysql_close($con);
function RandomString ()
{
    global $RandStrLength, $letters;

    $str = "";
    for ($i = 0;$i<$RandStrLength;$i++)
        $str .= $letters[rand(0,25)];
    return $str;
}

Of course you could just use a created dataset, like the NorthWind Database.

Jess
  • 8,628
  • 6
  • 49
  • 67
1

all you need to do is launch your script from command line like this:

php -q generator.php

it can then be a simple php file like this:

<?php

$fid = fopen("query.sql", "w");
fputs($fid, "create table a (id int not null auto_increment primary key, b int, c, int);\n");
for ($i = 0; $i < 50000000; $i++){
    fputs($fid, "insert into table a (b,c) values (" . rand(0,1000) . ", " . rand(0,1000) . ")\n");
}
fclose($fid);
exec("mysql -u$user -p$password $db < query.sql");
jancha
  • 4,916
  • 1
  • 24
  • 39
0

I just want to point you to http://www.mysqldumper.net/ which is a tool that allows you to backup and restore big databases with PHP.

The script has some mechanisms to circumvent the maximum execution time of PHP -> imo worth a look.

This is not a solution for generating data, but a great one for importing / exporting.

ppuschmann
  • 201
  • 2
  • 9
  • Thanks. I don't have the data to restore yet though, that's why I was looking for a tool to generate data first. Even then, it looks like mysqlimport would be better at loading the data once it's generated. – ObiHill Sep 20 '11 at 15:08
0

Do I use PHP and loop INSERT queries (bound to timeout)

Certainly running long duration scripts via a webserver mediated requset is not a good idea. But PHP can be compiled to run from the command line - in fact most distributions of PHP come bundled with this.

There are lots of things you do to make this run more efficiently, exactly which ones will vary depedning on how you are populating the data set (e.g. once only, lots of batch additions). However for a single load, you might want to have a look at the output of mysqldump (note disabling, enabling indexes, multiple insert lines) and recreate this in PHP rather than connecting directly to the database from PHP.

symcbean
  • 47,736
  • 6
  • 59
  • 94
0

I see no point in this question, and, especially, in raising a bounty for it.

as they say, "the best is the enemy of good"

You have asked this question ten days ago.
If you'd just go with whatever code you've got, you'd have your tables already and even done with your tests. But you lose so much time just in vain. It's above my understanding.

As for the method you've been asking for (just to keep away all these self-appointed moderators), there are some statements as a food for thought:

  • mysql's own methods considered more effective in general.
  • mysql can insert all data from the table into another using INSERT ... SELECT syntax. so, you will need to run only about 30 queries to get your 50 mil records.
  • and sure mysql can copy whole tables as well.
  • keep in mind that there should be no indexes at the time of table creation.
Your Common Sense
  • 156,878
  • 40
  • 214
  • 345
0

Probably it is fastest to run multiple inserts in one query as:

INSERT INTO `test` VALUES
    (1,2,3,4,5,6,7,8,9,0),
    (1,2,3,4,5,6,7,8,9,0),
     .....
    (1,2,3,4,5,6,7,8,9,0)

I created a PHP script to do this. First I tried to construct a query that will hold 1 million inserts but it failed. Then I tried with 100 thousend and it failed again. 50 thousends don't do it also. My nest try was with 10 000 and it works fine. I guess I am hitting the transfer limit from PHP to MySQL. Here is the code:

<?php
set_time_limit(0);
ini_set('memory_limit', -1);
define('NUM_INSERTS_IN_QUERY', 10000);
define('NUM_QUERIES', 100);

// build query
$time = microtime(true);
$queries =  array();
for($i = 0; $i < NUM_QUERIES; $i++){
    $queries[$i] = 'INSERT INTO `test` VALUES ';
    for($j = 0; $j < NUM_INSERTS_IN_QUERY; $j++){
            $queries[$i] .= '(1,2,3,4,5,6,7,8,9,0),';
    }
    $queries[$i] = rtrim($queries[$i], ',');
}

echo "Building query took " . (microtime(true) - $time) . " seconds\n";

mysql_connect('localhost', 'root', '') or die(mysql_error());
mysql_select_db('store') or die(mysql_error());
mysql_query('DELETE FROM `test`') or die(mysql_error());

// execute the query
$time = microtime(true);
for($i = 0; $i < NUM_QUERIES; $i++){
    mysql_query($queries[$i]) or die(mysql_error());
    // verify all rows inserted
    if(mysql_affected_rows() != NUM_INSERTS_IN_QUERY){
            echo "ERROR: on run $i not all rows inserted (" . mysql_affected_rows() . ")\n";
            exit;
    }
}

echo "Executing query took " . (microtime(true) - $time) . " seconds\n";
$result = mysql_query('SELECT count(*) FROM `test`') or die(mysql_error());
$row = mysql_fetch_row($result);
echo "Total number of rows in table: {$row[0]}\n";
echo "Total memory used in bytes: " . memory_get_usage() . "\n";
?>

The result on my Win 7 dev machine are:

Building query took 0.30241012573242 seconds
Executing query took 5.6592788696289 seconds
Total number of rows in table: 1000000
Total memory used in bytes: 22396560

So for 1 mil inserts it took 5 and a half seconds. Then I ran it with this settings:

define('NUM_INSERTS_IN_QUERY', 1);
define('NUM_QUERIES', 1000000);

which is basically doing one insert per query. The results are:

Building query took 1.6551470756531 seconds
Executing query took 77.895285844803 seconds
Total number of rows in table: 1000000
Total memory used in bytes: 140579784

Then I tried to create a file with one insert per query in it, as suggested by @jancha. My code is slightly modified:

$fid = fopen("query.sql", "w");
fputs($fid, "use store;");
for($i = 0; $i < 1000000; $i++){
    fputs($fid, "insert into `test` values (1,2,3,4,5,6,7,8,9,0);\n");
}
fclose($fid);
$time = microtime(true);
exec("mysql -uroot < query.sql");
echo "Executing query took " . (microtime(true) - $time) . " seconds\n";

The result is:

Executing query took 79.207592964172 seconds

Same as executing the queries through PHP. So, probably the fastest way is to do multiple inserts in one query and shouldn't be a problem to use PHP to do the work.

Martin Dimitrov
  • 4,796
  • 5
  • 46
  • 62
  • Thanks so much for this helpful run through. I'll definitely try this and see what I come up with. Cheers. – ObiHill Sep 19 '11 at 16:16