I'm testing a proprietary tool that dumps a table in a MySQL RDS to the parquet format, and then restores it into another MySQL RDS.
Both tables have the same amount of rows:
mysql> SELECT COUNT(*) FROM fox_owners;
+----------+
| COUNT(*) |
+----------+
| 118950 |
+----------+
The table itself is configured the same way in both cases:
mysql> SHOW CREATE TABLE fox_owners;
+------------+-------------------------------------------------------+
| Table | Create Table |
+------------+-------------------------------------------------------+
| fox_owners | CREATE TABLE `fox_owners` (
`name` mediumtext,
`owner_id` bigint NOT NULL,
PRIMARY KEY (`owner_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+------------+-------------------------------------------------------+
So far so good, right?
However, the table sizes are different. The original:
+----------+----------------------+------------+
| Database | Table | Size in MB |
+----------+----------------------+------------+
| stam_db | fox_owners | 5582.52 |
The restored one:
+----------+----------------------+------------+
| Database | Table | Size in MB |
+----------+----------------------+------------+
| stam_db | fox_owners | 5584.52 |
The restored one is 2MB bigger! However, what's really bugging me, is the change in cardinality of the indexes between the 2 tables. Original:
mysql> SHOW INDEX FROM fox_owners;
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| fox_owners | 0 | PRIMARY | 1 | owner_id | A | 118728 | NULL | NULL | | BTREE | | | YES | NULL |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
Restored:
mysql> SHOW INDEX FROM fox_owners;
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| fox_owners | 0 | PRIMARY | 1 | owner_id | A | 117518 | NULL | NULL | | BTREE | | | YES | NULL |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
Why would cardinality drop from 118728 to 117518? If the restored table is less unique than the original - isn't this a clear sign that this table is different? How can I verify that these 2 tables in separate RDS databases have identical content?
And shouldn't the cardinality be 118950 in both of them anyway, since for a table with a single primary key column, the cardinality must be equal to the number of rows in the table?
I've ran ANALYZE TABLE on both tables, the values didn't change.