Every developer working with databases has this problem. And often you can not estimate how many records a table will have in x years.
- LEFT JOIN vs. multiple SELECT statements
- performance - single join select vs. multiple simple selects
Especially when working with O/R mappers, this is extremely uncomfortable!
Why does the RDBMS driver don't solve this problem? Why multiple transfer a record instead of once and then reference it. For the client application this could be completely transparent. Or even offer advanced functions. Especially with OR mapper, it might even be very useful to create subbeans analogous to the DB data, only as a reference.
It would be wonderfull if you could join 1:n tables without the strain knowing of redudant data.
Does anyone know a RDBMS that optimizes like this? Or can't this be done? And if so, why?
----- ---- EDIT ----- -----
@Thilo: Thank for the link. Very interesting.
I've run a test with XAMPP for Windows.
PHP: 5.4.7
MySQL: 5.5.27
The result shows that you have to be carefull with JOIN in MySQL.
Everytime you do a JOIN you will get duplicated data (except 1:1). Why transfering this data multiple?
Test:
I created two tables. Table A with 500 records and 9 columns with VARCHAR(32) and Table B with 50000 records. (1:100)
SET @numA = 500;
SET @numBperA = 100;
DROP TABLE IF EXISTS `table_b`;
DROP TABLE IF EXISTS `table_a`;
DROP PROCEDURE IF EXISTS fill_table_b;
DROP PROCEDURE IF EXISTS fill_table_a;
CREATE TABLE `table_a` (
`id` int(11) NOT NULL,
`val1` varchar(32) NOT NULL,
`val2` varchar(32) NOT NULL,
`val3` varchar(32) NOT NULL,
`val4` varchar(32) NOT NULL,
`val5` varchar(32) NOT NULL,
`val6` varchar(32) NOT NULL,
`val7` varchar(32) NOT NULL,
`val8` varchar(32) NOT NULL,
`val9` varchar(32) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1;
delimiter $$
CREATE PROCEDURE fill_table_a()
BEGIN
DECLARE i INT DEFAULT 1;
SET i = 1;
WHILE ( i <= @numA) DO
INSERT INTO table_a (id, val1, val2, val3, val4, val5, val6, val7, val8, val9)
VALUES (i, md5(rand()), md5(rand()), md5(rand()), md5(rand()), md5(rand()), md5(rand()), md5(rand()), md5(rand()), md5(rand()));
SET i=i+1;
END WHILE;
END$$
delimiter ;
call fill_table_a();
CREATE TABLE IF NOT EXISTS `table_b` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`table_a_id` int(11) NOT NULL,
`val` varchar(32) NOT NULL,
PRIMARY KEY (`id`),
KEY `table_a_id` (`table_a_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
ALTER TABLE `table_b` ADD CONSTRAINT `table_b_ibfk_1` FOREIGN KEY (`table_a_id`) REFERENCES `table_a` (`id`);
delimiter $$
CREATE PROCEDURE fill_table_b()
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE j INT DEFAULT 1;
SET i = 1;
WHILE (i <= @numA) DO
SET j = 1;
WHILE (j <= @numBperA) DO
INSERT INTO table_b (table_a_id, val)
VALUES (i, md5(rand()));
SET j=j+1;
END WHILE;
SET i=i+1;
END WHILE;
END$$
delimiter ;
call fill_table_b();
Now I want to select 300 rows from table A and the dependent 30000 rows from table B.
I've done this 3 ways:
Select A JOIN B with a single request
$time = microtime(true);
for( $i = 0; $i < 50; $i++ ) {
$resultA = mysqli_query($link, "SELECT * FROM table_a LEFT JOIN table_b ON table_b.table_a_id = table_a.id WHERE table_a.id BETWEEN 100 AND 399");
$resultArray = array();
//while( $resultArray[] = mysqli_fetch_assoc($resultA) ) {}
$numRows = mysqli_num_rows($resultA);
}
$time2 = microtime(true);
echo("numSelectedRows: " . $numRows . "<br>time: " . number_format($time2 - $time, 3) . " sec.<br>Memory: " . number_format(memory_get_peak_usage() / 1024 / 1024, 3) . " MiB");
with fetch
numSelectedRows: 30000
time: 15.539 sec.
Memory: 55.649 MiBwithout fetch
numSelectedRows: 30000
time: 6.262 sec.
Memory: 3.431 MiB
Select A with single request. Iterate over Result and make 300 Requests to Table B.
$time = microtime(true);
for( $i = 0; $i < 50; $i++ ) {
$numRowsB = 0;
$resultA = mysqli_query($link, "SELECT * FROM table_a WHERE table_a.id BETWEEN 100 AND 399");
while( $row = mysqli_fetch_assoc($resultA) ) {
$resultB = mysqli_query($link, "SELECT * FROM table_b WHERE table_b.table_a_id = " . $row['id']);
while( mysqli_fetch_assoc($resultB) ) {}
$numRowsB += mysqli_num_rows($resultB);
}
}
$numRowsA = mysqli_num_rows($resultA);
$time2 = microtime(true);
echo("numSelectedRows A: " . $numRowsA . "<br>numSelectedRows B: " . $numRowsB . "<br>time: " . number_format($time2 - $time, 3) . " sec.<br>Memory: " . number_format(memory_get_peak_usage() / 1024 / 1024, 3) . " MiB");
- with fetch
numSelectedRows A: 300
numSelectedRows B: 30000
time: 7.713 sec.
Memory: 0.364 MiB
Select A with single request. Select B with single request.
$time = microtime(true);
for( $i = 0; $i < 50; $i++ ) {
$resultA = mysqli_query($link, "SELECT * FROM table_a WHERE table_a.id BETWEEN 100 AND 399");
$resultB = mysqli_query($link, "SELECT * FROM table_b WHERE table_b.table_a_id BETWEEN 100 AND 399");
$resultArray = array();
//while( $resultArray[] = mysqli_fetch_assoc($resultA) ) {}
//while( $resultArray[] = mysqli_fetch_assoc($resultB) ) {}
}
$numRowsA = mysqli_num_rows($resultA);
$numRowsB = mysqli_num_rows($resultB);
$time2 = microtime(true);
echo("numSelectedRows A: " . $numRowsA . "<br>numSelectedRows B: " . $numRowsB . "<br>time: " . number_format($time2 - $time, 3) . " sec.<br>Memory: " . number_format(memory_get_peak_usage() / 1024 / 1024, 3) . " MiB");
with fetch
numSelectedRows A: 300
numSelectedRows B: 30000
time: 6.020 sec.
Memory: 15.928 MiBwithout fetch
numSelectedRows A: 300
numSelectedRows B: 30000
time: 3.018 sec.
Memory: 1.156 MiB