1

Using PHP, I have a MySQL database with an Actions table, in which a user optionally assigns actions to some pages in their website. Each such assignment results in an action row, containing (among other things) a unique ActionId and the URL of the appropriate page.

Later on, when in a context of a specific page, I want to find out if there is an action assigned to that page, and fetch (SELECT) the appropriate action row. At that time I know the URL of my page, so I can search the Actions table, by this relatively long string. I suspect this is not an optimal way to search in a database.

I assume a better way would be to use some kind of hashing which converts my long URL strings into integers, making sure no two different URLs are converted into the same integer (encryption is not the issue here). Is there such a PHP function? Alternatively, is there a better strategy for this?

Note I have seen this: SQL performance searching for long strings - but it doesn't really seem to come up with a firm solution, apart from mentioning md5 (which hashes into a string, not to integer).

Community
  • 1
  • 1
Free Bud
  • 746
  • 10
  • 30
  • Why not use the page IDs as references? Also, `VARCHAR` shouldn't take that long to search, it's `TEXT` types you should be worried about for mass search. Also depending on the size of your DB, it can be quick enough or slow as hell. – casraf Dec 19 '13 at 12:08
  • If your URLs are unique you can create a UNIQUE index on that column and search on that. – Jon Dec 19 '13 at 12:10
  • 3
    A *properly indexed* column is pretty fast to search, almost regardless of the length of the search string. Have you actually tried it and proven that it's too slow before thinking about such complicated schemes? – deceze Dec 19 '13 at 12:49

2 Answers2

2

The hashing strategy is a good strategy.

Dealing with the URL strings might indeed be a problem, because they can be very long, and contain a lot of special chars, which are always problematic for MySQL search (REGEXP or LIKE).

That is why hashing solves the problem. Even md5 which is not a good hashing function to hash passwords (because it's not secure anymore), is good to hash URL.

This way you will have http://www.stackoverflow.com changed into 4c9cbeb4f23fe03e0c2222f8c4d8c065, and that will be pretty much unique (unless you are very very unlucky).

Once you have your md5_url field set up, you can search with :

SELECT * FROM Actions where md5_url=?

Where the ? is an md5($url) of current URL.

Of course be sure to set an index on your md5_url field :

ALTER TABLE Actions
ADD md5_url varchar(32),
ADD KEY(md5_url);
Thibault
  • 1,566
  • 15
  • 22
0

If you add an index to the column, the database should take care of efficiency for you, and the length of the URL should make no difference.

Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592