2

I think this question has been asked in some way shape or form but I couldn't find a question that had asked exactly what I wish to understand so I thought I'd put the question here

Problem statement

I have built a web application with a MySQL database of say customer records with an INT(11) id PK AI field and a VARCHAR(255) uuid field. The uuid field is not indexed nor set as unique. The uuid field is used as a public identifier so its part of URLs etc. - e.g. https://web.com/get_customer/[uuid]. This was done because the UUID is 'harder' to guess for a regular John Doe - but understand that it is certainly not 'unguessable' in theory. But the issue now is that as the database is growing larger I have observed that the query to retrieve a particular customer record is taking longer to complete.

My thoughts on how to solve the issue

The solution that is coming to mind is to make the uuid field unique and also index the same. But I've been doing some reading in relation to this and various blog posts, StackOverflow answers on this have described putting indices on UUIDs as being really bad for performance. I also read that it will also increase the time it takes to insert a new customer record into the database as the MySQL database will take time to find the correct location in which to place the record as a part of the index.

The above mentioned https://web.com/get_customer/[uuid] can be accessed without having to authenticate which is why I'm not using the id field for the same. It is possible for me to consider moving to integer based UUIDs (I don't need the UUIDs to be universally unique - they just need to be unique for that particular table) - will that improve the the indicing performance and in turn the insertion and querying performance?

Is there a good blog post or information page on how to best set up a database for such a requirement - Need the ability to store a customer record which is 'hard' to guess, easy to insert and easy to query in a large data set.

Any assistance is most appreciated. Thank you!

Prathamesh Datar
  • 375
  • 1
  • 4
  • 20
  • If you're noticing a delay in retrieval now, it will only get worse. You should index the UUID column, and set it unique. Indexed UUIDs can introduce a performance problem, but everything is relative. How often will you be creating new customer records? Once a second? Or once per day? The more frequently you create index entries the more you'll need to be careful with performance. – Tangentially Perpendicular Aug 05 '21 at 04:44
  • *and a VARCHAR(255) uuid field* Textual UUID representation needs in `CHAR(36)`, without dashes - `CHAR(32)`, and in binary representation it needs in `BINARY(16)`. *make the uuid field unique* Only if it must be unique in practice. *and also index* Compulsory. Without the relation to its uniqueness. – Akina Aug 05 '21 at 05:42
  • If the table will be bigger than RAM, you _will_ have performance problems. Read my opinions about UUIDs, especially in a MySQL context: http://mysql.rjweb.org/doc.php/uuid – Rick James Aug 05 '21 at 17:49
  • Another suggestion that a friend of mine gave was keeping the ```id INT PK AI``` as the main column through with a customer record query is built in the database but when sharing it in a URL it is hashed as a part of a two-way hashing function (encrypted) so that its not easily guessable or incrementable and then reverse-hashed when servicing the request. So it removes the need to have a UUID if its just for the purposes or being a public identifier. The trade-off I guess is the additional computation during hashing and reverse hashing. Thanks! – Prathamesh Datar Aug 06 '21 at 04:03

1 Answers1

1

The received wisdom you mention about putting indexes on UUIDs only comes up when you use them in place of autoincrementing primary keys. Why? The entire table (InnoDB) is built behind the primary key as a clustered index, and bulk loading works best when the index values are sequential.

You certainly can put an ordinary index on your UUID column. If you want your INSERT operations to fail in the astronomically unlikely event you get a random duplicate UUID value you can use an index like this.

ALTER TABLE customer ADD UNIQUE INDEX uuid_constraint (uuid);

But duplicate UUIDv4s are very rare indeed. They have 122 random bits, and most software generating them these days uses cryptographic-quality random number generators. Omitting the UNIQUE index is, I believe, an acceptable risk. (Don't use UUIDv1, 2, 3, or 5: they're not hard enough to guess to keep your data secure.)

If your UUID index isn't unique, you save time on INSERTs and UPDATEs: they don't need to look at the index to detect uniqueness constraint violations.

Edit. When UUID data is in a UNIQUE index, INSERTs are more costly than they are in a similar non-unique index. Should you use a UNIQUE index? Not if you have a high volume of INSERTs. If you have a low volume of INSERTs it's fine to use UNIQUE.

This is the index to use if you omit UNIQUE:

ALTER TABLE customer ADD UNIQUE INDEX uuid (uuid);

To make lookups very fast you can use covering indexes. If your most common lookup query is, for example,

SELECT uuid, givenname, surname, email
  FROM customer
 WHERE uuid = :uuid

you can create this so-called covering index.

ALTER TABLE customer 
  ADD INDEX uuid_covering (uuid, givenname, surname, email);

Then your query will be satisfied directly from the index and therefore be faster.

There's always an extra cost to INSERT and UPDATE operations when you have more indexes. But the cost of a full table scan for a query is, in a large table, far far greater than the extra INSERT or UPDATE cost. That's doubly true if you do a lot of queries.

In computer science there's often a space / time tradeoff. SQL indexes use space to save time. It's generally considered a good tradeoff.

(There's all sorts of trickery available to you by using composite primary keys to speed things up. But that's a topic for when you have gigarows.)

(You can also save index and table space by storing UUIDs in BINARY(16) columns and use UUID_TO_BIN() and BIN_TO_UUID() functions to convert them. )

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • Hi @O.Jones - Thank you for your answer. In relation to your comment where the 'UNIQUE' property on the column is not required, another point I read in another post somwhere was that the UNIQUE is useful when MySQL finds the first record with that UUID and doesn't keep checking the rest of the table to see if there are any others. Would that behaviour still apply if the column is indexed? And since 'space' is 'cheap', I also agree that doing the space trade-off is worthwhile when compared with the speed benefits achievable. I'll also have a look at the suggestion to store as binary. Thank you! – Prathamesh Datar Aug 06 '21 at 01:09
  • 1
    A UNIQUE index does indeed speed up lookups a tiny bit. It definitely slows down INSERTs. Please see my edit. And, I've been where you are... looking for a database design that's workable for the long term. My advice? Go for a clear and simple design that uses widely adopted design patterns (like autoincrementing primary keys). Why? those design patterns are very important to the software design teams (At MySQL and MariaDB). As your database grows over the next 20 years you can expect the dbms software to improve. **Don't overthink today's little quirks of performance.** – O. Jones Aug 06 '21 at 01:53
  • Yes, I intend to put as much planning as possible into the system now itself (but yes within reason and being practical) whilst it's still in 'infancy' than have to implement measures down the line when it has grown and is harder to maintain - thank you for your suggestions - I'll go with adding indices to the uuid and also UNIQUE - inserts are not that frequent on the customer table but lookups are. Thank you to others as well who have shared their thoughts on the same. – Prathamesh Datar Aug 06 '21 at 03:59