0

My code is as follows:

https://github.com/T145/tphroxy/blob/master/mirror.py

https://github.com/T145/tphroxy/blob/master/transform_content.py

And when going to certain websites I get errors along these lines:

Traceback (most recent call last):
  File " ... /mirror.py", line 108, in fetch_and_store
    response = urlfetch.fetch(mirrored_url)
  File " ... /google/appengine/api/urlfetch.py", line 293, in fetch
    return rpc.get_result()
  File " ... /google/appengine/api/apiproxy_stub_map.py", line 613, in get_result
    return self.__get_result_hook(self)
  File " ... /python27_lib/versions/1/google/appengine/api/urlfetch.py", line 449, in _get_fetch_result
    raise DNSLookupFailedError('DNS lookup failed for URL: ' + url)
DNSLookupFailedError: DNS lookup failed for URL: http://public/images/v6/btn_arrow_down_padded_white.png

My guess is that specific asset url patterns aren't being matched and sent through the proxy properly, i.e. transform_content is missing a pattern. Any help to solving this problem is greatly appreciated! I'm open to using any alternative libraries if needed.

DEMO

EDIT

I've added a test suite for transform_content, and I'm certain the primary problems are with my regex expressions from its results. Run it w/ py transform_content_test.py if you're on Windows to get the results.

T145
  • 1,415
  • 1
  • 13
  • 33
  • Just from some surface testing over at https://regex101.com/, `BASE_RELATIVE_URL_REGEX` fails when testing `slashdot.org`. – T145 Feb 07 '18 at 15:53

1 Answers1

1

DNS lookup failed for URL: http://public/... Note the missing domain (host) portion in the URL, the public string will be parsed as the domain, which is invalid, causing the error you see.

The URL should be something like http://<valid_domain>/public/..., so check your code building that URL.

You're doing quite a few string ops on the URLs, check that all your possible code paths operate properly, my guess is that some are not doing what you're expecting them to.

Dan Cornilescu
  • 39,470
  • 12
  • 57
  • 97
  • Based on your observation, I'd say the problem would lie around the MirrorHandler, since the error occurs when I pass a URL into `fetch_and_store`. So that means I must be doing something wrong around here: https://github.com/T145/tphroxy/blob/master/mirror.py#L179. See anything noteworthy? – T145 Feb 05 '18 at 21:37
  • Yep, through there. The 'r"/"' potentially returned by `BaseHandler.get_relative_url()` would definitely not have a host part, for example. Add a few more debug statements here and there and repro - it should not be a big deal finding out where the issue is. – Dan Cornilescu Feb 05 '18 at 22:46
  • The problem is w/ `base_url`; in the logging from my post example I get that the base URL equals `public`, which it shouldn't. Could the regex I apply to it when declaring `app` at the bottom of the source file be the issue? – T145 Feb 06 '18 at 00:09
  • Possibly. The output of your 2nd `logging.debug()` in `MirrorHandler.get()` in the logs should tell. – Dan Cornilescu Feb 06 '18 at 03:54
  • It does, and confirms the info previously stated. I deleted all of the memcaching code, and got a new error: `File " ... /mirror.py", line 168, in get for (key, value) in content.headers.iteritems(): AttributeError: 'NoneType' object has no attribute 'headers'`. This error does not affect visual output. The new code is on GH. – T145 Feb 06 '18 at 18:40
  • `content` is `None`, you should check for that. – Dan Cornilescu Feb 06 '18 at 19:21
  • Ik it is. My question is why is it null? Why did this just start happening after removing the caching functionality? – T145 Feb 06 '18 at 21:56