3

I am trying to capture a HLS Download to my hard drive and then serve the same HLS download from my hard drive.

I'm trying to do this with a python addon that looks as follows. In essence I write the response to file it if matches my criteria. Then if the client requests data that is already saved, I read the response from disk and serve it directly.

This add-on works as expected in that it correctly saves responses when they don't exist on disk, and then serves them when they do.

When I run mitmproxy -s myaddon.py the flows show that they take 0ms to serve from disk. Therefore, I was expecting the downloads to be much faster when serving from disk. But for some reason they are not.

To diagnose why this was happening, I investigated two possibilities.

1) despite intercepting the request/response and serving from disk, the request is still going over the internet and mitmproxy is waiting for the response to come back

2) reading the response from disk was taking a comparable amount of time to fetching it over the network

For 1) I tested this by using mitmproxy and the Charles proxy together. When the response was not on disk, I did see the network traffic on Charles. When the response was on disk, I did not see the traffic in Charles, therefore I ruled out this possibility.

For 2) I tested with wall time how long it was taking to read the response from disk, and it was consistenly ~0.0001 seconds, so I don't think that was the delay either.

I've tried this on two different networks, and feel it is unlikely that the LAN transport time is what's causing me to not see my expected speedup.

How can I further diagnose or do this differently?

from mitmproxy import http
from mitmproxy import ctx
import time
import os

class MyAddOn:
    def __init__(self):
        self.flows = 0
        self.response_time = 0
        self.request_time = 0

    def _filenameFor(self, flow: http.HTTPFlow):
        path = flow.request.path[1:]
        cwd = os.getcwd()
        filename = os.path.join(cwd, path)

        if "range" in flow.request.headers:
            range = flow.request.headers["range"]
            filename = os.path.join(filename, range)

        return filename


    def request(self, flow: http.HTTPFlow) -> None:
        start = time.time()

        if "myhlsserver.com" not in flow.request.host:
            return
        else:
            filename = self._filenameFor(flow)
            if os.path.isfile(filename):
                with open(filename, 'rb') as f:
                    content = f.read()
                    flow.response = http.HTTPResponse.make(200, content)

            end = time.time()
            time_taken = end - start
            self.response_time += time_taken
            self.flows += 1

            ctx.log.info(('Requests: ', self.flows, ))


    def response(self, flow: http.HTTPFlow) -> None:
        start = time.time()

        if "myhlsserver.com" not in flow.request.host:
            return
        else:
            filename = self._filenameFor(flow)
            if os.path.isfile(filename):
                return
            else:
                directory = os.path.dirname(filename)
                os.makedirs(directory, exist_ok=True)

                with open(filename, "xb") as file:
                    file.write(flow.response.content)

            end = time.time()
            time_taken = end - start
            self.response_time += time_taken

addons = [ MyAddOn() ]
Idr
  • 6,000
  • 6
  • 34
  • 49

0 Answers0