8

Running Varnish 4 and I'm occasionally getting 503 errors which I can't put my finger in. I've tried curling the backend directly and it's returning a response everytime so it doesn't look like the issue is there. Below is entry from my varnishlog but it doesn't tell me much. I obtained this log by running the command

varnishlog -q "RespStatus >= 500 or BerespStatus >= 500"

Here is the log entry

*              << Request  >>   524356
-       524356 Begin          c req 524355 rxreq
-       524356 Timestamp      c Start: 1421453220.095464 0.000000 0.000000
-       524356 Timestamp      c Req: 1421453220.095464 0.000000 0.000000
-       524356 ReqStart       c 173.255.199.52 38213
-       524356 ReqMethod      c GET
-       524356 ReqURL         c /asd/asdasd
-       524356 ReqProtocol    c HTTP/1.0
-       524356 ReqHeader      c X-Real-IP: 206.169.79.151
-       524356 ReqHeader      c Host: www.CENSORED.com
-       524356 ReqHeader      c X-Forwarded-For: 206.169.79.151
-       524356 ReqHeader      c X-Time-Server-Received: 1421453197
-       524356 ReqHeader      c Connection: close
-       524356 ReqHeader      c Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
-       524356 ReqHeader      c User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.99 Safari/537.36
-       524356 ReqHeader      c Accept-Encoding: gzip, deflate, sdch
-       524356 ReqHeader      c Accept-Language: en-US,en;q=0.8
-       524356 ReqHeader      c Cookie: wp-settings-1=deleted%3Dundefined%26ed_size%3D1305%26hidetb%3D1%26libraryContent%3Dbrowse%26editor%3Dtinymce%26imgsize%3Dlarge%26wplink%3D1%26urlbutton%3Dfile%26pos
ts_list_mode%3Dlist; wp-settings-time-1=1421309968; wp-saving-post=5-saved; wordpr
-       524356 ReqUnset       c X-Forwarded-For: 206.169.79.151
-       524356 ReqHeader      c X-Forwarded-For: 206.169.79.151, 173.255.199.52
-       524356 VCL_call       c RECV
-       524356 ReqUnset       c Cookie: wp-settings-1=deleted%3Dundefined%26ed_size%3D1305%26hidetb%3D1%26libraryContent%3Dbrowse%26editor%3Dtinymce%26imgsize%3Dlarge%26wplink%3D1%26urlbutton%3Dfile%26pos
ts_list_mode%3Dlist; wp-settings-time-1=1421309968; wp-saving-post=5-saved; wordpr
-       524356 ReqHeader      c Cookie: wp-settings-1=deleted%3Dundefined%26ed_size%3D1305%26hidetb%3D1%26libraryContent%3Dbrowse%26editor%3Dtinymce%26imgsize%3Dlarge%26wplink%3D1%26urlbutton%3Dfile%26pos
ts_list_mode%3Dlist; wp-settings-time-1=1421309968; wp-saving-post=5-saved; wordpr
-       524356 ReqUnset       c Cookie: wp-settings-1=deleted%3Dundefined%26ed_size%3D1305%26hidetb%3D1%26libraryContent%3Dbrowse%26editor%3Dtinymce%26imgsize%3Dlarge%26wplink%3D1%26urlbutton%3Dfile%26pos
ts_list_mode%3Dlist; wp-settings-time-1=1421309968; wp-saving-post=5-saved; wordpr
-       524356 ReqHeader      c Cookie: wp-settings-1=deleted%3Dundefined%26ed_size%3D1305%26hidetb%3D1%26libraryContent%3Dbrowse%26editor%3Dtinymce%26imgsize%3Dlarge%26wplink%3D1%26urlbutton%3Dfile%26pos
ts_list_mode%3Dlist; wp-settings-time-1=1421309968; wp-saving-post=5-saved; wordpr
-       524356 VCL_return     c pass
-       524356 VCL_call       c HASH
-       524356 VCL_return     c lookup
-       524356 VCL_call       c PASS
-       524356 VCL_return     c fetch
-       524356 Link           c bereq 524357 pass
-       524356 Timestamp      c Fetch: 1421453222.583956 2.488493 2.488493
-       524356 Timestamp      c Process: 1421453222.583983 2.488520 0.000027
-       524356 RespHeader     c Date: Sat, 17 Jan 2015 00:07:02 GMT
-       524356 RespHeader     c Server: Varnish
-       524356 RespHeader     c X-Varnish: 524356
-       524356 RespProtocol   c HTTP/1.1
-       524356 RespStatus     c 503
-       524356 RespReason     c Service Unavailable
-       524356 RespReason     c Service Unavailable
-       524356 VCL_call       c SYNTH
-       524356 RespHeader     c Content-Type: text/html; charset=utf-8
-       524356 RespHeader     c Retry-After: 5
-       524356 VCL_return     c deliver
-       524356 RespHeader     c Content-Length: 415
-       524356 Debug          c "RES_MODE 2"
-       524356 RespHeader     c Connection: close
-       524356 Timestamp      c Resp: 1421453222.584031 2.488567 0.000048
-       524356 ReqAcct        c 983 0 983 205 415 620
-       524356 End            c

My default.vcl

vcl 4.0;

import std;

backend default {
    .host = "localhost";
    .port = "81";
    .connect_timeout = 600s;
    .first_byte_timeout = 600s;
    .between_bytes_timeout = 600s;
}

acl purge {
    # Only allow purges coming from localhost
    "127.0.0.1";
    "localhost";
}

EDIT: varnishstat while throwing 503 errors. You can see fetch_failed has happened numerous times.

MAIN.uptime                       787         1.00         1.00         1.00         1.00
MAIN.sess_conn                  11076        10.99        14.00         8.30         8.17
MAIN.client_req                 11089        11.98        14.00         8.40         8.23
MAIN.cache_hit                   1488         1.00         1.00         1.32         1.41
MAIN.cache_hitpass                  2         0.00          .           0.00         0.00
MAIN.cache_miss                  3186         1.00         4.00         1.27         1.29
MAIN.backend_conn                9585         9.99        12.00         7.01         6.76
MAIN.backend_reuse                 17         1.00          .           0.16         0.12
MAIN.backend_recycle               17         1.00          .           0.16         0.12
MAIN.fetch_head                     2         0.00          .           0.06         0.06
MAIN.fetch_length                 346         2.00          .           0.39         0.35
MAIN.fetch_chunked               4497         3.99         5.00         2.27         2.06
MAIN.fetch_close                 4756         4.99         6.00         4.36         4.35
MAIN.fetch_failed                 180         0.00          .           0.10         0.12
MAIN.pools                          4         0.00          .           4.00         4.00
MAIN.threads                       54         0.00          .          54.00        54.00
MAIN.threads_limited                1         0.00          .           0.00         0.00
MAIN.threads_created              102         0.00          .           0.00         0.00
MAIN.threads_destroyed             48         0.00          .           0.00         0.00
MAIN.busy_sleep                     5         0.00          .           0.00         0.00
MAIN.busy_wakeup                    5         0.00          .           0.00         0.00
MAIN.sess_queued                   50         0.00          .           0.00         0.00
MAIN.n_object                    2873         1.00          .        2866.32      2865.00
MAIN.n_objectcore                2891         1.00          .        2885.40      2884.22
MAIN.n_objecthead                2990         1.00          .        2981.80      2980.00
MAIN.n_backend                      1         0.00          .           1.00         1.00
MAIN.n_expired                    135         0.00          .         133.18       132.78
MAIN.s_sess                     11076        10.99        14.00         8.30         8.17
MAIN.s_req                      11089        11.98        14.00         8.40         8.23
MAIN.s_pass                      6415         9.99         8.00         5.81         5.52
MAIN.s_fetch                     9601        10.99        12.00         7.09         6.82
MAIN.s_synth                      112         0.00          .           0.00         0.00
MAIN.s_req_hdrbytes           3346583      3583.50      4252.00      2308.12      2251.15
MAIN.s_req_bodybytes           804869      1072.65      1022.00       202.52       162.86
MAIN.s_resp_hdrbytes          2737544      3615.46      3478.00      1942.75      1857.93
MAIN.s_resp_bodybytes        66853226     14104.28     84946.00     37816.96     38189.12
MAIN.sess_closed                 9442         6.99        11.00         6.96         7.05
MAIN.backend_req                 9594        10.99        12.00         7.17         6.87

Daemon options

DAEMON_OPTS="-a :80 \
              -f /etc/varnish/default.vcl \
              -T 127.0.0.1:6082 \
              -u varnish -g varnish \
              -t 120 \
              -n $INSTANCE \
              -s malloc,10G \
                  -p thread_pools=4 \
                  -p thread_pool_min=5 \
                  -p thread_pool_max=500 \
                  -p thread_pool_timeout=300 \
              -S /etc/varnish/secret"

Any suggestions?

Axsuul
  • 101
  • 1
  • 3
  • 12
  • How does your VCL look? In which cases is varnish configured to return a 503? – Bazze Jan 17 '15 at 08:55
  • @Bazze It isn't configured to return a 503. Just edited my post with my VCL. Any suggestions? Also when the 503 appears, its hows instantly. So I don't think there's anything wrong with the backend, right? – Axsuul Jan 20 '15 at 07:36
  • Could you do a `varnishstat` when you're getting these 503s and show us the output? – Bazze Jan 23 '15 at 14:51
  • @Bazze Edited my answer with `varnishstat`, thanks, how does it look? – Axsuul Jan 25 '15 at 06:44
  • Your `threads_limited` value is `1`, which should really be `0`. It means that 1 time Varnish wanted to create a new thread but wasn't able to, either because of max threads or the thread_pool_add_delay. Could you show us the Varnish config specifying max/min threads etc? And also, was this really the complete output from varnishstat? (I was expecting some more values, like the ones in this one: http://serverfault.com/questions/624976/why-is-varnish-4-getting-slow-after-some-time) – Bazze Jan 25 '15 at 08:03
  • what is in entry 524357 as indicated by `bereq 524357 pass`? – Matthew Ife Jan 25 '15 at 13:50
  • @Bazze Sorry, here is the full varnishstat: https://gist.github.com/axsuul/3b02b8faa4e91ba5d1a1 just pasted the Varnish startup config in my question – Axsuul Jan 25 '15 at 18:34
  • @MatthewIfe how would I view entries? – Axsuul Jan 25 '15 at 18:35
  • Any updates fellas? This problem has plagued us for so long, willing to hire a consultant on this if any of you guys are available for that! – Axsuul Jan 29 '15 at 22:50
  • Check whether your Apache is listening on port 81 or not? If yes, then check how many request can it process now. I will suggest to set NginX as Apache proxy between Apache & Varnish. It will lessen the serving pressure. Here is a way what to do: [Set up NginX and Varnish with Apache](http://ugcoder.com/set-up-nginx-and-varnish-with-apache-in-zpanel/) – Md. Maksudur Rahman Khan Feb 04 '15 at 07:00

3 Answers3

3

How many requests/s and total connections does your varnish handle when the 503 errors occur?

I've run into a similar situation where it would return 503 errors to the users while the backends were running just fine.

varnishlog would show that the backends were sick when in fact were healthy.

So after some digging I found out that the problem was with the max file descriptors.

I had to increase my system's max file descriptors in order for varnish to be able to handle all the requests properly.

After that all 503 errors stopped and all backends constantly show as healthy.

Cha0s
  • 2,462
  • 2
  • 16
  • 26
  • My backend shows as healthy using `varnishadm` however (no probe). Do you think this still applies to me? – Axsuul Jan 25 '15 at 06:49
  • Yes. My backends were all healthy and yet due to file descriptor limit of linux varnish would show them as 'sick' causing 503 errors all over the place. – Cha0s Jan 25 '15 at 12:45
0

Please increase all of kernel limit: max file open, max process file and. If you are using linux base radhat like centos, edit 2 file: /etc/security/limit.conf

tquang
  • 263
  • 1
  • 6
0

you wont see that in varnish log, varnish 503 means that no backend, so you need to monitor /var/log/messages and Apache, Nginx, PHP error logs at that time. backend hits the error during the request, and varnish can not process the request further. so varnish doesnt know whats going on there, and it dont have to, it is not his business.

btw, try to start with Varnish recommended settings:

-p thread_pool_min=200 \
-p thread_pool_max=4000 \
-p thread_pool_add_delay=2 \
-p http_req_size=64000 \
-p cli_timeout=25 \
-p cli_buffer=36384 \
-p session_linger=100 \
-s malloc,(YOURMEMORY-20%)G

and this line is a must in sysctl.conf:

fs.file-max = 700000

and these too:

echo "*         soft    nofile          200000" >> /etc/security/limits.conf
echo "*         hard    nofile          700000" >> /etc/security/limits.conf
ADM
  • 1,373
  • 12
  • 16