8

I am currently working/testing microcache feature in NGINX reverse proxy setup for dynamic content.

One big issue that occurs is sessions/cookies that need to be ignored otherwise people will logon with random accounts on the site(s).

Currently I am ignoring popular CMS cookies like this:

if ($http_cookie ~* "(joomla_[a-zA-Z0-9_]+|userID|wordpress_(?!test_)[a-zA-Z0-9_]+|wp-postpass|wordpress_logged_in_[a-zA-Z0-9]+|comment_author_[a-zA-Z0-9_]+|woocommerce_cart_hash|woocommerce_items_in_cart|wp_woocommerce_session_[a-zA-Z0-9]+|sid_customer_|sid_admin_|PrestaShop-[a-zA-Z0-9]+") 
    {

# set ignore variable to 1
# later used in:
# proxy_no_cache                 $IGNORE_VARIABLE;
# proxy_cache_bypass             $IGNORE_VARIABLE;
# makes sense ?

    }

However this becomes a problem if I want to add more cookies to the ignore list. Not to mention that using too many "if" statements in NGINX is not recommended as per the docs.

My questions is, if this could be done using a map method ? I saw that regex in map is different( or maybe I am wrong ).

Or is there another way to efficiently ignore/bypass cookies ?

I have search a lot on stackoverflow, and whilst there are so many different examples; I could not find something specific for my needs.

Thank you

Update:

A lot of reading and "digging" on the internet ( we might as well just say Google ), and I found quite some interesting examples.

However I am very confused with these, as I do not fully understand the regex usage and I am afraid to implement such without understanding it.

Example 1:

map $http_cookie $cache_uid {
  default nil;
  ~SESS[[:alnum:]]+=(?<session_id>[[:alnum:]]+) $session_id;
}
  1. In this example I can notice that the regex is very different from the ones used in "if" blocks. I don't understand why the pattern starts without any "" and directly with just a ~ sign.

  2. I don't understand what does [[:alnum:]]+ mean ? I search for this but I was unable to find documentation. ( or maybe I missed it )

  3. I can see that the author was setting "nil" as default, this will not apply for my case.

Example 2:

map $http_cookie $cache_uid {
  default  '';
  ~SESS[[:alnum:]]+=(?<session_id>[[:graph:]]+)  $session_id;
}
  1. Same points as in Example 1, but this time I can see [[:graph:]]+. What is that ?

My Example (not tested):

map $http_cookie $bypass_cache {

    "~*wordpress_(?!test_)[a-zA-Z0-9_]+"  1;
    "~*wp-postpass|wordpress_logged_in_[a-zA-Z0-9]+"  1;
    "~*comment_author_[a-zA-Z0-9_]+"  1;
    "~*[a-zA-Z0-9]+_session)"  1;

    default      0;
}

In my pseudo example, the regex must be wrong since I did not find any map cookie examples with such regex.

So once again my goal is to have a map style list of cookies that I can bypass the cache for, with proper regex.

Any advice/examples much appreciated.

Mecanik
  • 1,539
  • 1
  • 20
  • 50
  • So you want to bypass cache if any of these cookies are found or you want to remove tamper with the cookies? – Tarun Lalwani Jul 13 '19 at 14:13
  • @TarunLalwani - Yes, I want to bypass the cache if those cookies are met. Otherwise the micro-cache will cause terrible problems, and people will login and see other's accounts and data. – Mecanik Jul 13 '19 at 15:18
  • thanks for all the fish, +1! – cnst Jul 20 '19 at 19:29

2 Answers2

5

What exactly are you trying to do?

The way you're doing it, by trying to blacklist only certain cookies from being cached, through if ($http_cookie …, is a wrong approach — this means that one day, someone will find a cookie that is not blacklisted, and which your backend would nonetheless accept, and cause you cache poisoning or other security issues down the line.

There's also no reason to use the http://nginx.org/r/map approach to get the values of the individual cookies, either — all of this is already available through the http://nginx.org/r/$cookie_ paradigm, making the map code for parsing out $http_cookie rather redundant and unnecessary.

Are there any cookies which you actually want to cache? If not, why not just use proxy_no_cache $http_cookie; to disallow caching when any cookies are present?


What you'd probably want to do is first have a spec of what must be cached and under what circumstances, only then resorting to expressing such logic in a programming language like nginx.conf.

For example, a better approach would be to see which URLs should always be cached, clearing out the Cookie header to ensure that cache poisoning isn't possible (proxy_set_header Cookie "";). Else, if any cookies are present, it may either make sense to not cache anything at all (proxy_no_cache $http_cookie;), or to structure the cache such that certain combination of authentication credentials are used for http://nginx.org/r/proxy_cache_key; in this case, it might also make sense to reconstruct the Cookie request header manually through a whitelist-based approach to avoid cache-poisoning issues.

cnst
  • 25,870
  • 6
  • 90
  • 122
  • Thank you for your clear explanation, my ultimate purpose would be to ignore all cookies from micro-cache. I had no idea "proxy_no_cache $http_cookie;" would actually work and ignore all cookies ? Is this correct ? – Mecanik Jul 16 '19 at 16:10
  • 1
    Also I really do not see a reason why one would cache "some" cookies at all.. I don't understand why all the tutorials and articles show how to ignore only specific cookies from microcache... – Mecanik Jul 16 '19 at 16:14
  • @NorbertBoros if the user is logged in, why not have a user-specific cache? Or if cookies are used for no good reason — for example, I get rid of cookies from OpenGrok on BXR.SU, and cache every single response; see https://stackoverflow.com/a/45359481/1122270. As for `proxy_no_cache $http_cookie;`, yes, I think that'll do exactly what you want, then — see http://nginx.org/r/proxy_no_cache, the doc is very clear — *"If at least one value of the string parameters is not empty and is not equal to “0” then the response will not be saved"*. :-) – cnst Jul 16 '19 at 21:20
  • @NorbertBoros, well, you gotta know what you're caching; if it's all static resources, then you should just serve it statically, and there's no need for caching; if it's dynamic content, chances are there's cookies involved, and then you can't really cache it without introducing issues. – cnst Jul 17 '19 at 19:37
1

You 2nd example that you have is what you actually need

map $http_cookie $bypass_cache {

    "~*wordpress_(?!test_)[a-zA-Z0-9_]+"  1;
    "~*wp-postpass|wordpress_logged_in_[a-zA-Z0-9]+"  1;
    "~*comment_author_[a-zA-Z0-9_]+"  1;
    "~*[a-zA-Z0-9]+_session)"  1;

    default      0;
}

Basically here what you are saying the bypass_cache value will be 1 if the regex is matched else 0.

So as long as you got the pattern right, it will work. And that list only you can have, since you would only know which cookies to bypass cache on

Tarun Lalwani
  • 142,312
  • 9
  • 204
  • 265
  • But there is literally no example on the internet with this regex style [a-zA-Z0-9_]+ inside a map ? Are you sure ? – Mecanik Jul 13 '19 at 15:44
  • Basically its a regex pattern and people use only what is needed and in most it cases they want a value containing something and not a very specific thing that it has to be number or something – Tarun Lalwani Jul 13 '19 at 15:49
  • And there are different uses also, like see this post https://serverfault.com/questions/482372/nginx-httpmapmodule-regex-variables – Tarun Lalwani Jul 13 '19 at 15:51
  • Thank you for your answer, however I do know it's a regex pattern... my question is different. Why does nobody use [a-zA-Z0-9_]+ style inside a map directive, and what does [[:alnum:]]+ differentiate ? – Mecanik Jul 13 '19 at 15:56
  • There are just same, but `:alnum:` is POSIX way of providing the regex. https://www.regular-expressions.info/posixbrackets.html. I have never used the POSIX ones in nginx, I usually prefer using the `[a-zA-Z0-9_]+` only – Tarun Lalwani Jul 13 '19 at 16:01
  • See this https://github.com/AntonRiab/slim_middle_samples/blob/2b20e4b19a8e9d2cf209fe4d3f32faff183162ee6/regexp_test/return.filters.nginx.conf. Its not that people don't use it. It just that its not used a lot – Tarun Lalwani Jul 13 '19 at 16:05
  • Ok then, let me test my example in production and see. – Mecanik Jul 13 '19 at 16:22