0

According to the PHP docs the existsInDatabase() method is intended to determine if an instance of Collection exists in the database (as a SchemaObject aka a Table). But it only considers a Collection to exist if it's table's schema fits exactly that which it would if it was created using the Schema::createCollection() method, eg. Exactly 2 columns called doc and _id.

Is this a bug caused by PHP module being too strict?

Here's a demonstration (Depends on the mysql_xdevapi PHP module)...

Firstly, use MySQL to create a collection called 'people' in your db (as taken from MySQL's world_x.sql script in examples for JavaScript/Python):

CREATE TABLE `countryinfo` (
    `doc` json DEFAULT NULL,
    `_id` varbinary(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,_utf8mb4'$._id'))) STORED NOT NULL,
    `_json_schema` json GENERATED ALWAYS AS (_utf8mb4'{"type":"object"}') VIRTUAL,
    PRIMARY KEY (`_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Secondly, create a collection called 'people', very similar to above but with the _json_schema column omitted:

CREATE TABLE `people` (
    `doc` json DEFAULT NULL,
    `_id` varbinary(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,_utf8mb4'$._id'))) STORED NOT NULL,
    PRIMARY KEY (`_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Thirdly, use PHP's createCollection() method to create a collection named 'animals':

$dbSession = mysql_xdevapi\getSession("mysqlx://test:test@localhost");

$schema = $dbSession->getSchema("world_x");    

$collection = $schema->createCollection('animals');

Now, use PHP's existsInDatabase() method to check the 3 collections:

$dbSession = mysql_xdevapi\getSession("mysqlx://test:test@localhost");

$schema = $dbSession->getSchema("world_x");

if ($schema->getCollection('countryinfo')->existsInDatabase()) {
    print "countryinfo collection exist! ";
}

if ($schema->getCollection('people')->existsInDatabase()) {
    print "People collection exist! ";
}

if ($schema->getCollection('animals')->existsInDatabase()) {
    print "Animals collection exist! ";
}

All 3 of these are valid collections according to MySQL docs but the above PHP code only recognises 'people' and 'animals' as existing.

As a further experiment, use MySQL to add any randomly named column to the collection named "animals" (which was created via PHP) and re-execute the script to see how it now does not consider it to exist.

ALTER TABLE animals ADD COLUMN test tinyint;

Now if you use $schema->getCollections() and $schema->getTables() you will see that animals is classified under tables not collections. Drop that column and it jumps back again. Under the bonnet, this is how the module decides if a Collection "exists".

I suspect the PHP module might be being too strict. This a problem because it tightly-couples your database schema to a specific language. If, for example, I wanted to build a PHP script to interact with an existing MySQL Document Store that was originally populated via JavaScript it could be buggy because presumably that extra _json_schema column (or some other columns) would exist in some/all of the Collections.

I assume it is legitimate to have columns other than doc and _id as I have not found specifications to the contrary. I can imagine a very real use case for having a created_at datetime column on a Collection.

Alternatively, have I actually discovered a problem not with the PHP module but with MySQLs documentation? Is the _json_schema column a hangover from an early version that should now be removed? Is the definition of a Collection table now standardised across all languages and they must strictly have just those 2 columns as specified?

It's all a bit fluffy with MySQL being vague but PHP being strict.

Any help is appreciated. I will report bugs to maintainers if the consensus is that there is a bug.

Martin Joiner
  • 3,529
  • 2
  • 23
  • 49

1 Answers1

1

------------------------------- {edit April 21} -------------------------------

It looks like there is neither any error nor a change in the behavior of xplugin/server. Adding extra columns to a collection is not allowed. That _json_schema column from 'countryinfo' collection is not treated as something extra - it is meant for a new feature called 'schema validation', and is optional. So it may appear in the collection's table or not. Therefore given table is still treated as collection, no matter it contains '_json_schema' or not. Although, it is enough to change its name from '_json_schema' to e.g. 'json_schema' (i.e. just remove leading underscore), and it will be treated as extra column, so given table will be not reported as collection anymore (existsInDatabase() returns false).

The cited sample:

https://dev.mysql.com/doc/refman/8.0/en/mysql-shell-tutorial-javascript-download.html

is meant for 8.0.19, but it looks like that version of the server is already prepared for 'schema validation', while mysql_xdevapi will officially support that feature since v8.0.21.

in mentioned new version .21 after such call:

$coll = $schema->createCollection("animals");

the columns in the 'animals' table will look as follows

mysql> show columns from animals;
+--------------+---------------+------+-----+---------+-------------------+
| Field        | Type          | Null | Key | Default | Extra             |
+--------------+---------------+------+-----+---------+-------------------+
| doc          | json          | YES  |     | NULL    |                   |
| _id          | varbinary(32) | NO   | PRI | NULL    | STORED GENERATED  |
| _json_schema | json          | YES  |     | NULL    | VIRTUAL GENERATED |
+--------------+---------------+------+-----+---------+-------------------+
3 rows in set (0.00 sec)

and it will be reported as a collection.

------------------------------- {end of edit April 21} -------------------------------

We've checked the behavior of con/php, and looks like it performs similarly as con/nodejs, i.e.

  1. 'countryinfo' is reported as collection
  2. adding / dropping columns causes changes in reported type of 'animals' in con/nodejs or shell too (collection vs table).

Please run the below script, and let us know what is the output on your environment.

// please align below variables according to your environment
$connection_uri = "mysqlx://test:test@localhost";
$db = "testx";

$session = mysql_xdevapi\getSession($connection_uri);
$session->sql("create database $db")->execute();
$schema = $session->getSchema($db);

$session->sql("USE $db")->execute();

$countryinfo_table_stmt="CREATE TABLE `countryinfo` (
    `doc` json DEFAULT NULL,
    `_id` varbinary(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,_utf8mb4'$._id'))) STORED NOT NULL,
    `_json_schema` json GENERATED ALWAYS AS (_utf8mb4'{\"type\":\"object\"}') VIRTUAL,
    PRIMARY KEY (`_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4";

echo "$countryinfo_table_stmt\n";
$session->sql($countryinfo_table_stmt)->execute();

$people_table_stmt="CREATE TABLE `people` (
    `doc` json DEFAULT NULL,
    `_id` varbinary(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,_utf8mb4'$._id'))) STORED NOT NULL,
    PRIMARY KEY (`_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4";

echo "$people_table_stmt\n";
$session->sql($people_table_stmt)->execute();

if ($schema->getCollection('countryinfo')->existsInDatabase()) {
    print "countryinfo collection exists!\n";
} else {
    print "countryinfo collection DOES NOT exist!\n";
}

if ($schema->getCollection('people')->existsInDatabase()) {
    print "People collection exists!\n";
} else {
    print "People collection DOES NOT exist!\n";
}

$coll = $schema->createCollection("animals");

if ($schema->getCollection('animals')->existsInDatabase()) {
    print "Animals collection exists!\n";
} else {
    print "Animals collection DOES NOT exist!\n";
}

$session->sql("ALTER TABLE $db.animals ADD COLUMN test tinyint")->execute();

if ($schema->getCollection('animals')->existsInDatabase()) {
    print "Animals collection exists!\n";
} else {
    print "Animals collection DOES NOT exist!\n";
}

$session->sql("ALTER TABLE $db.animals DROP COLUMN test")->execute();

if ($schema->getCollection('animals')->existsInDatabase()) {
    print "Animals collection exists!\n";
} else {
    print "Animals collection DOES NOT exist!\n";
}

print "done!\n";

in our environment the output looks as follows:

CREATE TABLE `countryinfo` (
    `doc` json DEFAULT NULL,
    `_id` varbinary(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,_utf8mb4'$._id'))) STORED NOT NULL,
    `_json_schema` json GENERATED ALWAYS AS (_utf8mb4'{"type":"object"}') VIRTUAL,
    PRIMARY KEY (`_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
CREATE TABLE `people` (
    `doc` json DEFAULT NULL,
    `_id` varbinary(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,_utf8mb4'$._id'))) STORED NOT NULL,
    PRIMARY KEY (`_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
countryinfo collection exists!
People collection exists!
Animals collection exists!
Animals collection DOES NOT exist!
Animals collection exists!
done!

Besides, what is the version of server/connectors you use?

  • I get a different output: "countryinfo collection DOES NOT exist! / People collection exists! / Animals collection exists! / Animals collection DOES NOT exist! / Animals collection exists! / done!". I am using mysql Ver 8.0.18, mysql_xdevapi 8.0.19, PHP Version 7.3.16-1+ubuntu18.04.1+deb.sury.org+1. – Martin Joiner Apr 14 '20 at 14:41
  • OK, I confirm that on server v8.0.18 con/php behaves differently, i.e. returns "countryinfo collection DOES NOT exist!". I assume that the only difference for con/nodejs or shell on your machine (I assume v8.0.18) is that they return "countryinfo collection exists!", but the rest of the output (for collections 'People' and 'Animals') is identical? – marinesovitch Apr 15 '20 at 06:15
  • I haven't experimented with NodeJS. Regardless of Node or PHP, patch-version 18 or 19. Can we define what the /correct/ behaviour should be? Have the specifications of when a MySQL table is a 'Collection' and when it is not been defined and documented? Because this sort of binary distinction seems pretty important to get right across all connectors. – Martin Joiner Apr 15 '20 at 10:35
  • OK, as U've mentioned that PHP may be too restrictive, so I wanted to investigate these possible differences between connectors. But it doesn't matter. Anyway, in PHP we don't perform any extra restrictive checks and rely on info from the server, so I contacted xplugin/server guys to proceed with it further. – marinesovitch Apr 16 '20 at 15:12
  • Please check my answer - it is at the beginning of the above message (I've edited it) – marinesovitch Apr 21 '20 at 12:14
  • Ah excellent, we have some clarity! So... for a table to be considered a Collection, the official rule was that it must only have `doc` and `_id` columns but in later versions it can also have `_json_schema` column to support a new validation feature. That's good to know. I think that kind of clarity should be included in both the MySQL and PHP documentation as I expect there are many others out there who, like me, were considering adding extra columns but still thinking of using it as a Collection. Thank you for your hard work in solving the mystery @marinesovitch. – Martin Joiner Apr 21 '20 at 14:58
  • OK, I've created orabug in our internal bugs database and stored there proper info related to this issue (including this StackOverflow link). Now documentation and server teams will decide how to proceed with it further. Thank you for reporting it. – marinesovitch Apr 23 '20 at 13:33
  • Optionally U can also submit your own bug report in our public bug tracker using the “MySQL Server: Document Store: Documentation” category. – marinesovitch Apr 23 '20 at 17:02