0

I want to have random primary keys in many of my tables (users table, posts table, etc...) just like the design of medium.com (have a look at the article id in the url or in the apis, it's a 12 random hex chars string that most probably corresponds to a 64-bit integer), with this design, I gain a much bigger space and it is also a bit more secure design than using serial primary keys to resist automated requests, gaining information about the website like total posts or total users or scraping

Since the storage difference isn't that much of important these days, will there be major performance difference especially for fetching queries, or is it very insignificant in 2018 especially if Postgres is hosted on something like RDS?

elixenide
  • 44,308
  • 16
  • 74
  • 100
pls no
  • 129
  • 1
  • 3
  • 4
    This question *is* too broad - how can one say something about performance for unspecified platforms on unspecified hardware? It's not about stupid, it's about the impossibility to give a good answer. I'd *expect* `SELECT` performance not to supper noticeably, but the code that *generates* those keys (that you don't show) is probably significantly slower than using a sequence. I admit that I don't understand the advantages you expect. – Laurenz Albe May 16 '18 at 05:32
  • I am talking from the postgres primary key searching algorithm prespective. What about the code that generate the keys? is it a big issue? it's just a random 12 hex chars generated randomly from the software level and converted to 64 bit integer. – pls no May 16 '18 at 05:40
  • I don't understand why you "encode" a 12 byte hex number into a bigint? You are not gaining anything with that - you only have the downside of first generating some hex number then convert that to an bigint - why not use a bigint to start with? And how do you "gain a bigger space"? You store it in a bigint so you are limited by the range a bigint can store. As for raw query performance I don't think there will be a big difference –  May 16 '18 at 06:04
  • One way you could test the raw performance is to use [pgbench](https://www.postgresql.org/docs/current/static/pgbench.html) do a run with all keys being integers, then change the tables to store them as bigint and repeat the test –  May 16 '18 at 06:06
  • **Not** using a sequence to generate a unique number (integer, bigint) can usually only done in two ways: a fast way that doesn't work correctly with concurrent transactions. And a correct way that usually doesn't scale for a high insert rates. Did you consider UUIDs as an alternative? –  May 16 '18 at 06:07
  • Thanks @a_horse_with_no_name for the answers, I will use pgbench and see the differences, as for generating the ids, for 12 hex bytes, it's 48 bits, so I have 2^48 space and subsequently a collision is very inprobable. for UUIDs, that's will be too much large space for my use case – pls no May 16 '18 at 06:15
  • 2
    I still doubt that rolling your own sequence generator is worth it. And I don't buy the "security" aspect - security by obscurity never really works. You will still need to implement the necessary security checks regardless if the numbers are sequentially or not. So the effort to implement a secure system will not be less because of that but you wind up with a bottle neck for the number generation that will most probably impact the overall performance of the system. –  May 16 '18 at 06:18
  • you might want to look at [this](https://stackoverflow.com/a/1734728) –  May 16 '18 at 06:23
  • Thanks! if I may ask one more question, is searching randomly distributed indeces (primary for my case) faster than serial ones? – pls no May 16 '18 at 07:24

1 Answers1

0

Is there any performance difference? Yes. Serial keys would be 4 bytes and your keys are 8 bytes. That is more space and hence more work.

Does it make a difference? Probably not. Some databases would cluster (i.e. sort) the data by the primary key. Postgres does not do that. Such clustering is an issue because it immediately leads to fragmentation for randomly generated keys.

You seem to have a good use-case for the key structure you are proposing. Although I don't see an a priori issue with it, you might want to test the difference to see if it matters for your application.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thank you! I don't really care about the size difference since the cost difference is insignificant in todays storage, I just care about latency of fetching requests. Another question: is searching through randomly distributed indeces (e.h. searching through random primary keys in my case) generally faster than serial ones or slower in terms of the searching algorithm prespective? – pls no May 16 '18 at 12:27
  • @plsno . . . They are stored using a tree-based index (generally), so the ordering of the values doesn't make a different. – Gordon Linoff May 17 '18 at 02:57