0

The examples I saw about designing a URL shortened all suggest first to create an sequential ID column in the database, and then use this DB-generated ID to get a shortened URL after converting it for an example to base 62. My question is, why not just use the DB generated ID as the shortened ID?

For an example, if I save the URL in the db www.google.com, and the db generated ID for that was 348, then why not just use that as the shortened URL? eg bit.ly/348

Kaidul
  • 15,409
  • 15
  • 81
  • 150
hello_its_me
  • 743
  • 2
  • 19
  • 52
  • Because it's longer? – Bernhard Barker Sep 23 '18 at 17:43
  • @Dukeling How is it longer? DB IDs can start from 0 so you set the first ID as 0, which will be used as the short URL, then the second URL will be 1, third 3, etc.. We can do this up to at least millions of URls, which will be 7 char long.. – hello_its_me Sep 23 '18 at 17:46
  • For any given number of characters, you can generate more codes using a larger alphabet than just the digits 0-9. That's why using the ID can, over time, require longer urls on the average. I don't know if that's the main reason the examples you saw use an extra conversion step, but it makes sense to me. – Ted Hopp Sep 23 '18 at 17:49
  • @TedHopp ok so for the 644 million URLs that already exist in the world, we would need a 9 char long short URL. Which is not bad, and very close to a lot of the URLs length that services are using now a days. I still do not see the need to go an extra step to shorten a URL. – hello_its_me Sep 23 '18 at 17:53
  • 2
    There were an estimated 644 million _web sites_ in 2012. There are a far larger number of web sites today and a far, far larger number of urls. Probably by several orders of magnitude. But you make a good point; it doesn't take al that many digits to represent them all. Perhaps another reason for encoding the database ID column value is to avoid exposing raw internal data from your site to the web, as a (somewhat weak) security precaution. Like I said, I don't know why the examples you saw do this; I'm just guessing here. – Ted Hopp Sep 23 '18 at 17:58
  • @TedHopp You do have a very good and valid point regarding the security of my data. This convinced me. Thanks :) – hello_its_me Sep 23 '18 at 18:52
  • If you generated a db id which is incrementally generated then you are leaking some info to the hackers/users/investors like the number of objects you have and making it easy to impersonate a different user. Also saving 1 billions keys in the db takes a lot of space and you want to save space as much as possible and thus using 6-7 chars is going to be your bet. – Ankur Kothari Sep 10 '21 at 05:47

1 Answers1

0

Lets say, you're provisioning your system to accommodate 9 billion urls (There are an estimated 1.8 billion web sites in 2018, considering average 5 urls per website). And lets say, you will use (a-z, A-Z, 0-9) to encode the shortened URLs. If you call x as minimum number of characters to represent 9 billion total URLs, then will be the smallest integer such that x^62 > 9*10^9.

Log (9*10^9) to the base 62 = 6

So, you will need 6 characters to be able to uniquely identify all 6 billion urls.

Kaidul
  • 15,409
  • 15
  • 81
  • 150