6

I've got a lot of requests that avoid caching because all of their list permutations are listed, ie:

http://.....&var=a,b,c
http://.....&var=a,c,b
http://.....&var=b,a,c
http://.....&var=b,c,a
http://.....&var=c,a,b
http://.....&var=c,b,a

Is there a clever way to hash these to the same value? Is the easiest way to sub in the sorted version of the querystring value?

Tommy Andersen
  • 7,165
  • 1
  • 31
  • 50
Stefan Mai
  • 23,367
  • 6
  • 55
  • 61
  • Why not make sure you always generate the parameters in a fixed order when you display URLs? – Oliver Charlesworth Nov 09 '11 at 23:58
  • @Oli It's an external API, we don't control the parameters. – Stefan Mai Nov 10 '11 at 00:55
  • All else being equal, I'd definitely prefer a canonical version of the URL over taking a hash, so that means sort them. Assuming of course that the response really is the same, for example it doesn't include the url itself. Does varnish itself use hashes a lot? If so, then perhaps you should define a hash function on the urls that splits out the components of the list, and then combines their hashes using some commutative operator (like addition) to give the hash of the URL as a whole. – Steve Jessop Nov 10 '11 at 01:13
  • @Steve This certainly seems like a reasonable approach. I'm really looking for something Varnish-specific here as the general solution is pretty simple. Adding hashes does seem to be better than sorting. – Stefan Mai Nov 10 '11 at 01:37
  • 1
    The problem with a simple operator like addition is that whilst it would collide "3,4,5" and "4,3,5", it would also collide "2,4,6"... – Oliver Charlesworth Nov 10 '11 at 08:55
  • @Oli: depends what the hash function is for the components of the list. For the obvious one (the hash of a 1-char string is the ASCII value of the character), yes. – Steve Jessop Nov 10 '11 at 09:41
  • But that's why I prefer to take a canonical form where possible. If you compute a hash, then you also need a comparator that does the same breakdown into components as the hash did, then checks whether the lists are actually equal or just a hash collision. By which time, you might as well have replaced the URL in the first place with a sorted equivalent so that any future hashing/comparison can be done as strings. – Steve Jessop Nov 10 '11 at 10:39

2 Answers2

10

I've written a module for Varnish which reorders the query parameters alphabetically.

Blog post with some explanation:
http://cyberroadie.wordpress.com/2012/01/05/varnish-reordering-query-string/

Code can be found here: https://github.com/cyberroadie/varnish-urlsort

3vlM33pl3
  • 537
  • 5
  • 14
2

"Rewrite your url to a canonical form and then hash it". This is easier said then done, because vcl has no operations for parameter processing (other than regex matching). You need some inline C to do the processing for you, or use the other proxy/load balancer (if you have it) in front of your varnish to rewrite your request (like nginx).

illagrenan
  • 6,033
  • 2
  • 54
  • 66
ivy
  • 5,539
  • 1
  • 34
  • 48
  • 1
    Ended up writing inline C to sort both the querystring parameters (ie "c=x&a=x&b=x" -> "a=x&b=x&c=x", and the comma separated fields that were applicable ie "vals=c,a,b" -> "vals=a,b,c"). Giving you credit as the only answer ;). – Stefan Mai Nov 10 '11 at 23:16
  • Can you post the inline C you wrote for sorting the query string? – 3vlM33pl3 Dec 14 '11 at 11:51
  • @Cyberroadie Totally wish I could, but given it's live (and that it's linked into Varnish), I don't think my organization would allow it. Sorry. – Stefan Mai Dec 14 '11 at 18:38
  • no worries, just wondering are you using library for this? Is it possible to #include with inline c? Or did you write a url parser yourself? – 3vlM33pl3 Dec 14 '11 at 22:57
  • @Cyberroadie No, I actually wrote the parser myself (though it would probably make more sense to use a library). I wasn' – Stefan Mai Jan 09 '12 at 23:46