4

I am trying to get requests to pass on encoded strings in a URL. For example:

/application/controller/function/hello%20world

so that in my function I can access request.args and get a string that I can unquote().

I tried modifying rewrite.py to not convert %20 but that caused some error. Something else is catching these requests somewhere that I am having trouble finding. I noticed the httpserver.log file has:

127.0.0.1, 2011-09-02 00:12:09, GET, /application/controller/function/hello world, HTTP/1.1, 200, 0.169954

with the space already converted. Maybe that gives a hint. Where are the url's getting unencoded?

Below are the contents of my routes file:

#!/usr/bin/python
# -*- coding: utf-8 -*-

default_application = 'chips'
default_controller = 'default'
default_function = 'index'


routes_onerror = [
   (r'*/404', r'/chips/static/404.html')
   ,(r'*/*',  r'/chips/static/error.html')
]
Charles L.
  • 5,795
  • 10
  • 40
  • 60
  • 1
    Are you using routes.py? If so, are you using the parameter-based system or the pattern-based system? – Anthony Sep 03 '11 at 02:50
  • I am only using routes to change the default application. I'm not sure which type of system I'm using. – Charles L. Sep 04 '11 at 15:00
  • What is the content of your routes.py file? – Anthony Sep 05 '11 at 14:41
  • I edited the question to include the contents. – Charles L. Sep 07 '11 at 02:19
  • 1
    Is your app behind a web server such as lighttpd, nginx, apache, etc? They can also alter URLs. – six8 Sep 07 '11 at 02:35
  • I have my local server which is the only one I've been using for testing which is behind nothing - I run python web2py.py. I will definitely have to keep that in mind though when these go on a live server behind apache! – Charles L. Sep 07 '11 at 04:32

1 Answers1

2

By default, web2py will not allow special characters in args except '@', '-', '=', and '.'. To override that behavior, you can add the following to routes.py:

routes_apps_raw=['chips']

In that case, request.args will be set to None, and instead you can access the raw args from the URL via request.raw_args. Note, though, that routes_apps_raw does not work if you are using the parameter-based rewrite system (i.e., if your routes.py file includes a routers dictionary).

Note, even with the above change, the Rocket web server included with web2py will still automatically unquote() the URL, so you'll get the special characters in request.raw_args, but they will already be decoded.

If you are instead using the parameter-based rewrite system, you can control which characters are allowed in URL args via the args_match key, which takes a regular expression as its value. The default regex is r'([\w@ -]|(?<=[\w@ -])[.=])*$', which allows '@', '-', '=', and '.' (with some restrictions on '=' and '.').

Anthony
  • 25,466
  • 3
  • 28
  • 57
  • Note that the list of allowed characters seems to have now changed to those that match the following regexp: r'([\w@ -]|(?<=[\w@ -])[.=])*$ – user2667066 Aug 24 '16 at 21:55
  • No, the allowed characters have not changed - you can have "@" or "-" anywhere in the arg but "=" or "." only at the end of an arg (as long as those are not the only characters). – Anthony Aug 25 '16 at 13:59
  • Yes, the chars are the same, but the allowed position is different - I can now have = or . within the arg itself, and thats what the regexp allows too. Try `re.match( r'([\w@ -]|(?<=[\w@ -])[.=])*$', 'abcd=efgh')` – user2667066 Aug 25 '16 at 14:53
  • Actually, it's a bit more complicated, and I have updated the answer. The regex you have shown is not the default -- it is only used if you are using the parameter-based rewrite system and do not specify a value for the `args_match` key. You are correct that it allows '=' or '.' within an arg, but it allows only one such character in a row, and they must come after one or more other characters. When not using the parameter-based rewrite, by default the args can have any number of '=' or '.' anywhere. – Anthony Aug 25 '16 at 16:03
  • Thanks. Very helpful. – user2667066 Aug 25 '16 at 20:09