Caching in Amazon CloudFront for HLS/HDS streaming with Wowza 3.5

Question

I am trying to quickly learn some of the underlying technology behind HDS and HLS live streaming.

I have set up Wowza Media server 3.5 on a Amazon WebServices instance in EC2 and distribute through CloudFront. I did my first live event and was watching my server load creep higher and higher. I was wondering if anyone could help me understand some of the underpinnings of HDS/HLS live streaming (and nDVR...):

Wowza is the origin server for my cloudfront distribution
HDS manifest file is set to cache for 2 seconds in CF (TTL time)
I checked all the IPs hitting my Wowza Instance, they appeared in fact to be CF caching locations, or edge servers
(assumption) Since all HDS traffic is over port 80, all video content ends up being cached at the edge locations (albeit for 2 secs)

Here's where my question comes about: how data is served for video content (here's my understanding, please set me straight!): - When a viewer requests a playlist or mainfest file, they get XML back which points the player to a chunk of video/audio (m4fa and m4fv within the DVR application instance?) data that they need to play next. Since this data is also delivered over port 80, it is cached as well.

If the above statements are correct, then does the following make sense for optimization for HDS and HLS:

Case 1: DVR service: I set the rules for caching in CloudFront as follows:

anything ending with "f4m", "m3u8", and "?DVR" to cache for 2 seconds (the playlist/manifest files)
the default for everything else should cache for longer (perhaps an hour, or 24 hours...?) This way, the DVR data remains cached but the playlist keeps updating every 2 seconds

Case 2: No DVR service (is this the better way to optimize?)

I suspect we could optimize server load as well by killing the DVR service all together, this way all data distributed through CF is just the most recent audio/video packets - therefore all viewers should be requesting the same playlist and data files, and thusly a large amount of people requesting those files is no big deal since my server is only getting hit every 2 seconds from each edge location to update the manifest files).
If this the media file chunks are given a longer caching time, would it matter if we killed the DVR service if the media remains cached?

Thanks for any insight you can offer!

I think you understand it correctly. I've just got one question -- is it possible that the the caching is getting messed up by having sessionid in urls? — vipw, Dec 13 '12 at 09:54
Yes, vipw, That is the behavior I am seeing. I'm currently looking in the documentation for a way to disable these unique session IDs if you have any ideas! Thanks for the suggestion! — dcoffey3296, Dec 13 '12 at 14:38
I think you need to do a push-style of approach. You can have a script initialize a session from Wowza and push what it reads to the S3 that CloudFront is reading. But I haven't used Wowza for a couple of years, they might have a better solution. Have you asked on their forum? They're extremely helpful in my experience. — vipw, Dec 13 '12 at 14:52

score 2 · Answer 1 · answered May 07 '13 at 03:23

Is there a way to configure a different sitename for the media? For DVR sessions, you want the m3u8 files to be served directly from your server (no CF, or 2 second CF), but the media files to be served though CloudFront with a really long expire time.

(For non-DVR sessions, it could all go thru CloudFront, since it's cachable.)

The usefulness of CloudFront really depends on how many popular (and unpopular) streams you have.

For example, let's say there are 20 CloudFront boxes in a particular POP. If 5 people view a stream, each one is likely to hit a different CF box, get a cache miss and need to hit your server anyway. You'd have to have 50 or 70 people viewing a stream from that POP before CloudFront stops hitting your server and serves everything from the cache. Because there are many POPs, you could have 100 people viewing a stream all over the world, but each hitting a different box in a different POP, and your server still gets 100 requests.

Thanks for the answer. Are you sure that this is the case and that we will see benefits of CF after 100+ users? Is there no way to make sure users just hit the same CF box? Because if not it seems hard to verify if it's working. — sbaar, Mar 29 '16 at 23:47
I'm saying if you have a low number of users, CF will not lighten the load on your upstream server. (It may still make sense to use CF because of network reasons.) Note that AWS gives you cache hit stats so you can verify it's working. — BraveNewCurrency, Apr 06 '16 at 18:47

Caching in Amazon CloudFront for HLS/HDS streaming with Wowza 3.5

1 Answers1