-1

We are trying use Varnish as proxy/cache for our media server. Our streams are MPEG-TS (h264/h265) over http. There is 1000 live streams on this media server and each stream getting multiple connection. We tried to configure Varnish shown as below but we have these problems.

  1. Streams get close after a short period of time
  2. Sometimes cant able to connect to streams, stuck at connecting...
  3. Got these errors on varnislog;
-   FetchError     Could not get storage
-   FetchError     Could not get storage
-   FetchError     Could not get storage
-   FetchError     Could not get storage
-   FetchError     Resource temporarily unavailable
-   FetchError     eof socket fail
-   FetchError     Resource temporarily unavailable
-   FetchError     eof socket fail
-   FetchError     Could not get storage
-   FetchError     Could not get storage
-   FetchError     Could not get storage
-   FetchError     Could not get storage
-   FetchError     Could not get storage
-   FetchError     Could not get storage
-   FetchError     Could not get storage
-   FetchError     Could not get storage
-   FetchError     Could not get storage
-   FetchError     Resource temporarily unavailable
-   FetchError     eof socket fail
-   FetchError     Resource temporarily unavailable
-   FetchError     eof socket fail
-   FetchError     Resource temporarily unavailable
-   FetchError     eof socket fail
-   FetchError     Could not get storage
-   FetchError     Could not get storage

My config;

vcl 4.0;

import directors;


backend s6855 {
    .host = "127.0.0.1";
    .port = "6855";
    .first_byte_timeout     = 10s;   # How long to wait before we receive a first byte from our backend?
    .connect_timeout        = 5s;     # How long to wait for a backend connection?
    .between_bytes_timeout  = 30s;     # How long to wait between bytes received from our backend?
}

backend s6866 {
    .host = "127.0.0.1";
    .port = "6866";
    .first_byte_timeout     = 10s;   # How long to wait before we receive a first byte from our backend?
    .connect_timeout        = 5s;     # How long to wait for a backend connection?
    .between_bytes_timeout  = 30s;     # How long to wait between bytes received from our backend?
    }

backend s6877 {
    .host = "127.0.0.1";
    .port = "6877";
    .first_byte_timeout     = 10s;   # How long to wait before we receive a first byte from our backend?
    .connect_timeout        = 5s;     # How long to wait for a backend connection?
    .between_bytes_timeout  = 30s;     # How long to wait between bytes received from our backend?
}

backend s6888 {
    .host = "127.0.0.1";
    .port = "6888";
    .first_byte_timeout     = 10s;   # How long to wait before we receive a first byte from our backend?
    .connect_timeout        = 5s;     # How long to wait for a backend connection?
    .between_bytes_timeout  = 30s;     # How long to wait between bytes received from our backend?
}

backend s6899 {
    .host = "127.0.0.1";
    .port = "6899";
    .first_byte_timeout     = 10s;   # How long to wait before we receive a first byte from our backend?
    .connect_timeout        = 5s;     # How long to wait for a backend connection?
    .between_bytes_timeout  = 30s;     # How long to wait between bytes received from our backend?
}


sub vcl_init {
    new fb = directors.round_robin();
    fb.add_backend(s6855);
    fb.add_backend(s6866);
    fb.add_backend(s6877);
    fb.add_backend(s6888);
    fb.add_backend(s6899);

}


sub vcl_recv {

    set req.grace = 120s;

    set req.backend_hint = fb.backend();

    if (req.url ~ "(\.ts)" ) {
    unset req.http.Range;
    }
    if (req.http.cookie) {
        unset req.http.cookie;
    }

    if (req.method != "GET" && req.method != "HEAD") {
    return (pipe);
    }

    if (req.method == "GET" && req.url ~ "(\.ts)"  ) {
        unset req.http.Accept-Encoding;
        return(hash);
    }
return(hash);
}

sub vcl_hash {
    hash_data(req.url);
    return(lookup);
}

sub vcl_backend_response {
    set beresp.grace = 2m; 
    set beresp.ttl = 120s;
    set beresp.do_gunzip = false;
    set beresp.do_gzip = false;

    if (bereq.url ~ "(\.ts)") {
    set beresp.ttl = 60s;
    set beresp.http.X-Cacheable = "YES";
    }

                else    {
    set beresp.ttl = 10m;
    set beresp.http.X-Cacheable = "NO";
    }

    if ( beresp.status == 404 ) {
    set beresp.ttl = 5m;
    }
 
    return(deliver);
}


sub vcl_hit {
    if (obj.ttl == 0s) {
    return(pass);
    }

    return(deliver);
}

sub vcl_miss {
}

sub vcl_deliver {
    set resp.http.X-Served-By = "For Test";

    if (obj.hits > 0) {
    set resp.http.X-Cache = "HIT";
    set resp.http.X-Cache-Hits = obj.hits;

    } else {
    set resp.http.X-Cache = "MISS";
    }



    if(resp.http.magicmarker) {
    unset resp.http.magicmarker; 
    set resp.http.Age="0";
    }

    unset resp.http.Via;
    unset resp.http.X-Varnish;

}

Varnish Usage

Since pretty new to Varnish not sure how to debug the problem, your help will be appreciated.

Thanks

Talion
  • 1

1 Answers1

0

The problem you're experiencing is not just a lack of object storage, but the fact that your biggest HTTP response is larger than the total size of the object storage.

This means Varnish cannot LRU evict the required space to fit the object in cache.

Could not get storage is an error that is typically returned when this happens.

Check the sizes

It is important to figure out how big your cache is, and what the size of the object is that fails on you.

Your varnishd runtime settings will tell you how big your object storage is. The -s malloc,<size> contains this value.

You can also use varnishstat to check the size & usage of your memory cache and the transient storage:

varnishstat -f SMA.*.g* -f MAIN.n_lru_nuked

The MAIN.n_lru_nuked counter that is also included in this command, will indicate how many objects that Varnish is forcefully removing from the cache to clear up space for new objects.

Fixing the issue

The easiest way to fix the issue, is to assign more memory to Varnish via -s malloc,<size>. Don't forget to restart Varnish after you have changed these settings.

After that, the following command will help you figure out if there's enough storage, and if Varnish still needs to forcefully remove objects from cache the free up space:

varnishstat -f SMA.*.g* -f MAIN.n_lru_nuked

A more sustainable plan

Another plan is to rely on the Massive Storage Engine (MSE). This is a storage engine that is part of Varnish Enterprise.

It combines memory and disk storage, and is optimized to handle large volumes of data. It avoids fragmentation, and is architected to not suffer from the typical latency of disk access.

There are official machine images for AWS, Azure & Google Cloud that allow you to experiment with this storage engine, without having to buy a license upfront.

A killer MSE feature is the memory governor. This is a mechanism that dynamically sizes the memory storage of your caches based on the needs of requests & responses.

If you run short of memory, and there isn't a lot of memory needed for thread handling, the memory governor will automatically assign more memory to the storage engine.

If you use the persistence layer of MSE, you can host terrabytes of data on a single machine, without running into these issues.

At Varnish Software, the company that builds Varnish Enterprise, we see MSE as the primary feature that OTT video streaming companies use to accelerate video delivery.

What if my assessment is completely wrong

Although the Could not get storage error usually appears when Varnish is trying to store huge objects in cache when the size of the cache is too small, I could also be wrong.

In that case, I would advise you to run varnishlog and see the full trace of what's going on in that specific transaction:

varnishlog -g request -q "ReqUrl eq '/my-url'"

This examples gets all the details of requests for /my-url. Please change this to the URL you're trying to monitor.

The output will usually give you a better understanding of how Varnish is behaving. This can help us figure out how to fix the issue, if my initial assessment was wrong.

Thijs Feryn
  • 3,982
  • 1
  • 5
  • 10
  • Thank you very much for your answer Thisj. I suspect about my time settings ttls and graces specially. I dont understand why my objects not decay in the cache and renewed in timely manner. Since the the file in this case is a mpegts stream which have non-stop data transaction that will grow/flow infinitely, at a point varnish should parse a part and serve it then before its ttl time there should be fresh data to serve i think. MSE actually our cure for this project but we dont have option to use it at the moment unfortunately. My systemd file https://pastebin.com/tT3X50Cz . 32GB Ram / 32C CPU – Talion Nov 07 '20 at 00:35
  • @Talion I'd like to see the value of the `n_lru_limited` counter in `varnishstat` when the error occurs. The errors point in the direction of Varnish failing to free up storage. – Thijs Feryn Nov 10 '20 at 08:51