2

I have a table that caches data (shared hosting so no memcached) to a MySQL table.

The concept is this:

I have a page that loads (static) data and then cache then:

  • If the cache does not exist then it queries the page then render the HTML and save it to the cache table.
  • If a page does not exist in cache, it executes 12 queries (menu, page content, SEO, product list, etc.) then saves the rendered HTML in the table.

The cache table is like this:

=cache=
url varchar(255) - primary key
page mediumtext

Now I think I'm doing the right thing, based on what I have (shared host, no caching like memcached, etc.) but my question is this:

Because the URL is a varchar index but because numeric IDs (like int) are faster, is there a way to convert a URL like /contact-us/ or /product-category/product-name/ to a unique integer? Or is there any other way to optimize this?

Ry-
  • 218,210
  • 55
  • 464
  • 476
Alexei
  • 493
  • 1
  • 4
  • 11
  • use a regular auto_increment int primary key, and store the url as a unique (but not primary) keyed field. It's easier to haul the int around when referring to things. Use the url text to look up the auto_inc id. – Marc B Feb 23 '12 at 00:51
  • 1
    so you still have to lookup by URL and not ID? – Alexei Feb 23 '12 at 00:52
  • You want to cache 12 queries? Not that many. . . although it depends if they're complex or not. Why are you storing a cache in mysql, why not create a file on the file system and use that as the cache? – Flukey Feb 23 '12 at 00:52
  • `select id where url='someurl'`, then use the id number elsewhere for further queries. – Marc B Feb 23 '12 at 00:53
  • 1
    @Flukey no, i want to cache the output so next time it loads from the cache, the page is static, i dont want to run these queries all the time - 12 qeuries is an example, sometimes i have less and something i have more. also file access is slower than DB access (from what i read) – Alexei Feb 23 '12 at 00:54
  • 1
    @MarcB then i have to do 2 qeuries? one for the ID and one for the content? – Alexei Feb 23 '12 at 00:54
  • "alexie - yes, so store the output to a file on the file system so you don't have to use mysql. why don't you put a function call in your pages to check if a cached version of the requested page exists? if not, create it. If so, serve it. trivial. – Flukey Feb 23 '12 at 00:56
  • @alexie - file access is always quicker than db access. why do you think it's better to store images on the FS than in the DB? – Flukey Feb 23 '12 at 00:56
  • @Flukey http://stackoverflow.com/questions/849061/file-access-speed-vs-database-access-speed – Alexei Feb 23 '12 at 01:06
  • @Alexei - For writing to. For reading, the file system is quicker. That question and it's subsequent answers don't back up your point. – Flukey Feb 23 '12 at 01:09
  • 1
    @flukey: not always. if the db's got the query results cached, it can still be faster than hitting disk. – Marc B Feb 23 '12 at 01:09
  • @MarcB - but he's on shared hosting. In order to enable the query cache in mysql you'd have to be root. Although it could already be enabled. – Flukey Feb 23 '12 at 01:12
  • @MarcB - are you saying in code it could be quickerto open a connection to the DB, execute the query, process the result and output the html than it is to do a simple file_get_contents of a file on the FS? Seems unlikely. – Flukey Feb 23 '12 at 01:15
  • @flukey: mysql caches query results regardless of the user doing the querying. admittedly it's an in-memory cache and could be flushed quickly on a shared host, but still... it **IS** a cache. – Marc B Feb 23 '12 at 01:16
  • 1
    @flukey: on a shared host, it's entirely possible that you'll have to wait your turn for access to disk i/o, plus rotational latency, etc... easily add up to many milliseconds. Unless we're talking huge query results, it doesn't that THAT long to convert a few result rows to html. – Marc B Feb 23 '12 at 01:18
  • so loading a cached version of a page will be faster then and getting content from DB and save to DB using the same connection thats faster? – Alexei Feb 23 '12 at 01:19
  • @MarcB - I see your points, however, I stand my ground when I say loading a cached file from the FS is faster than opening a connection, doing the query etc. yes, we're talking miniscule amounts of time, but nevertheless, it'll be faster. Baring in mind the MySQL server is unlikely to be on the same machine, thus latency between the server hosting the code and the db server could be a problem. – Flukey Feb 23 '12 at 01:30
  • @MarcB So, what should I do? can you please make an answer? – Alexei Feb 23 '12 at 01:41
  • @flukey So, what should I do? can you please make an answer? – Alexei Feb 23 '12 at 01:41
  • What you're doing is fine. I'm just arguing about the fact that it would be faster to use files on the file system to do your cache. Using the DB for this, just seems like a overkill and because of the hosting environment, you could run into problems. – Flukey Feb 23 '12 at 01:43

3 Answers3

2

I would create some form of hash which would allow a shorter key. In many cases something simple like a hash of the request path may be viable. Alternatively something even simpler like CRC32('/your/path/here') may be suitable in your situation as a primary key. In this example the following columns would exist

 urlCRC INT(11) UNSIGNED NOT NULL (PRIMARY KEY)
 url VARCHAR(255) NOT NULL
 page MEDIUMTEXT

You could then take this a step further, and add trigger BEFORE INSERT which would calculate the value for urlCRC, i.e. containing

NEW.urlCRC = CRC32(NEW.url)

You could then create a stored procedure which takes as argument inURL (string), and internally it would do

SELECT * FROM cacheTable WHERE urlCRC = CRC32(inURL);

If the number of rows returned is 0, then you can trigger logic to cache it.

This may of course be overkill, but would provide you a numeric key to work on which, assuming there are no conflicts, would suffice. By storing the url as VARCHAR(255) also then if a conflict does occur, you can easily regenerate new hashes using a different algorithm.

Just to be clear, I just use CRC32() as an example off the top of my head, chances are there are more suitable algorithms. The main point to take away is that a numeric key is more efficient to search so if you can convert your strings into unique numerics it would be more efficient when retrieving data.

0

Changing your url column to a fixed-size string would make indexing slightly faster, if there wasn't another dynamically-sized column (TEXT) in the table. Converting it to an integer would be possible, depending on your URL structure - you could also use some kind of hash function. But why don't you make your life easier?

You could save your cache results directly to the disk and create a mod_rewrite filter (put it tou your .htaccess file), that matches if the file exists, otherwise invokes the PHP script. This would have 2 advantages:

  1. If the cache is hot, PHP will not run. This saves time and memory.
  2. If the file is requested often and it is small enough (or you have lots of RAM), it will be held in the RAM. This is much faster than MySQL.
iblue
  • 29,609
  • 19
  • 89
  • 128
0

select all cached urls with a hash, then search for exact url in all hash colisions

select page from (select * from cache where HASHEDURL = STOREDHASH) where url = 'someurl'