39

For the last month, we've had a bot scraping our site regularly, leading to a bunch of ArgumentError: invalid %-encoding errors because the URLs are malformed. I've looked at a bunch of issues in rack here and here and rails here, and looked at this SO thread but there doesn't seem to be a definitive solution. Is there a correct solution for GET errors? Do I have to monkeypatch rack?

edit: And here's a backtrace:

/usr/local/lib/ruby/1.9.1/uri/common.rb:898:in `decode_www_form_component'
[GEM_ROOT]/gems/rack-1.4.5/lib/rack/utils.rb:41:in `unescape'
[GEM_ROOT]/gems/rack-1.4.5/lib/rack/utils.rb:94:in `block (2 levels) in parse_nested_query'
[GEM_ROOT]/gems/rack-1.4.5/lib/rack/utils.rb:94:in `map'
[GEM_ROOT]/gems/rack-1.4.5/lib/rack/utils.rb:94:in `block in parse_nested_query'
[GEM_ROOT]/gems/rack-1.4.5/lib/rack/utils.rb:93:in `each'
[GEM_ROOT]/gems/rack-1.4.5/lib/rack/utils.rb:93:in `parse_nested_query'
[GEM_ROOT]/gems/rack-1.4.5/lib/rack/request.rb:332:in `parse_query'
[GEM_ROOT]/gems/actionpack-3.2.12/lib/action_dispatch/http/request.rb:269:in `parse_query'
[GEM_ROOT]/gems/rack-1.4.5/lib/rack/request.rb:186:in `GET'
[GEM_ROOT]/gems/actionpack-3.2.12/lib/action_dispatch/http/request.rb:225:in `GET'
[GEM_ROOT]/gems/actionpack-3.2.12/lib/action_dispatch/http/parameters.rb:10:in `parameters'
[GEM_ROOT]/gems/actionpack-3.2.12/lib/action_dispatch/http/filter_parameters.rb:33:in `filtered_parameters'
[GEM_ROOT]/gems/actionpack-3.2.12/lib/action_controller/metal/instrumentation.rb:21:in `process_action'
[GEM_ROOT]/gems/actionpack-3.2.12/lib/action_controller/metal/params_wrapper.rb:207:in `process_action'
[GEM_ROOT]/gems/activerecord-3.2.12/lib/active_record/railties/controller_runtime.rb:18:in `process_action'
[GEM_ROOT]/gems/actionpack-3.2.12/lib/abstract_controller/base.rb:121:in `process'
[GEM_ROOT]/gems/actionpack-3.2.12/lib/abstract_controller/rendering.rb:45:in `process'
[GEM_ROOT]/gems/actionpack-3.2.12/lib/action_controller/metal.rb:203:in `dispatch'
[GEM_ROOT]/gems/actionpack-3.2.12/lib/action_controller/metal/rack_delegation.rb:14:in `dispatch'
[GEM_ROOT]/gems/actionpack-3.2.12/lib/action_controller/metal.rb:246:in `block in action'
[GEM_ROOT]/gems/actionpack-3.2.12/lib/action_dispatch/routing/route_set.rb:73:in `call'
[GEM_ROOT]/gems/actionpack-3.2.12/lib/action_dispatch/routing/route_set.rb:73:in `dispatch'
[GEM_ROOT]/gems/actionpack-3.2.12/lib/action_dispatch/routing/route_set.rb:36:in `call'
[GEM_ROOT]/gems/journey-1.0.4/lib/journey/router.rb:68:in `block in call'
[GEM_ROOT]/gems/journey-1.0.4/lib/journey/router.rb:56:in `each'
[GEM_ROOT]/gems/journey-1.0.4/lib/journey/router.rb:56:in `call'
[GEM_ROOT]/gems/actionpack-3.2.12/lib/action_dispatch/routing/route_set.rb:601:in `call'
[GEM_ROOT]/gems/omniauth-1.1.1/lib/omniauth/strategy.rb:177:in `call!'
[GEM_ROOT]/gems/omniauth-1.1.1/lib/omniauth/strategy.rb:157:in `call'
[GEM_ROOT]/gems/sass-3.2.7/lib/sass/plugin/rack.rb:54:in `call'
[GEM_ROOT]/gems/warden-1.2.1/lib/warden/manager.rb:35:in `block in call'
[GEM_ROOT]/gems/warden-1.2.1/lib/warden/manager.rb:34:in `catch'
[GEM_ROOT]/gems/warden-1.2.1/lib/warden/manager.rb:34:in `call'
[GEM_ROOT]/gems/actionpack-3.2.12/lib/action_dispatch/middleware/best_standards_support.rb:17:in `call'
[GEM_ROOT]/gems/rack-1.4.5/lib/rack/etag.rb:23:in `call'
[GEM_ROOT]/gems/rack-1.4.5/lib/rack/conditionalget.rb:25:in `call'
[GEM_ROOT]/gems/actionpack-3.2.12/lib/action_dispatch/middleware/head.rb:14:in `call'
[GEM_ROOT]/gems/actionpack-3.2.12/lib/action_dispatch/middleware/params_parser.rb:21:in `call'
[GEM_ROOT]/gems/actionpack-3.2.12/lib/action_dispatch/middleware/flash.rb:242:in `call'
[GEM_ROOT]/gems/rack-1.4.5/lib/rack/session/abstract/id.rb:210:in `context'
[GEM_ROOT]/gems/rack-1.4.5/lib/rack/session/abstract/id.rb:205:in `call'
[GEM_ROOT]/gems/actionpack-3.2.12/lib/action_dispatch/middleware/cookies.rb:341:in `call'
[GEM_ROOT]/gems/activerecord-3.2.12/lib/active_record/query_cache.rb:64:in `call'
[GEM_ROOT]/gems/activerecord-3.2.12/lib/active_record/connection_adapters/abstract/connection_pool.rb:479:in `call'
[GEM_ROOT]/gems/actionpack-3.2.12/lib/action_dispatch/middleware/callbacks.rb:28:in `block in call'
[GEM_ROOT]/gems/activesupport-3.2.12/lib/active_support/callbacks.rb:405:in `_run__497203393471184793__call__4495106819278994598__callbacks'
[GEM_ROOT]/gems/activesupport-3.2.12/lib/active_support/callbacks.rb:405:in `__run_callback'
[GEM_ROOT]/gems/activesupport-3.2.12/lib/active_support/callbacks.rb:385:in `_run_call_callbacks'
[GEM_ROOT]/gems/activesupport-3.2.12/lib/active_support/callbacks.rb:81:in `run_callbacks'
[GEM_ROOT]/gems/actionpack-3.2.12/lib/action_dispatch/middleware/callbacks.rb:27:in `call'
[GEM_ROOT]/gems/actionpack-3.2.12/lib/action_dispatch/middleware/remote_ip.rb:31:in `call'
[GEM_ROOT]/gems/actionpack-3.2.12/lib/action_dispatch/middleware/debug_exceptions.rb:16:in `call'
[GEM_ROOT]/gems/actionpack-3.2.12/lib/action_dispatch/middleware/show_exceptions.rb:56:in `call'
[GEM_ROOT]/gems/railties-3.2.12/lib/rails/rack/logger.rb:32:in `call_app'
[GEM_ROOT]/gems/railties-3.2.12/lib/rails/rack/logger.rb:16:in `block in call'
[GEM_ROOT]/gems/activesupport-3.2.12/lib/active_support/tagged_logging.rb:22:in `tagged'
[GEM_ROOT]/gems/railties-3.2.12/lib/rails/rack/logger.rb:16:in `call'
[GEM_ROOT]/gems/actionpack-3.2.12/lib/action_dispatch/middleware/request_id.rb:22:in `call'
[GEM_ROOT]/gems/rack-1.4.5/lib/rack/methodoverride.rb:21:in `call'
[GEM_ROOT]/gems/rack-1.4.5/lib/rack/runtime.rb:17:in `call'
[GEM_ROOT]/gems/activesupport-3.2.12/lib/active_support/cache/strategy/local_cache.rb:72:in `call'
[GEM_ROOT]/gems/rack-1.4.5/lib/rack/lock.rb:15:in `call'
[GEM_ROOT]/gems/rack-cache-1.2/lib/rack/cache/context.rb:136:in `forward'
[GEM_ROOT]/gems/rack-cache-1.2/lib/rack/cache/context.rb:143:in `pass'
[GEM_ROOT]/gems/rack-cache-1.2/lib/rack/cache/context.rb:172:in `rescue in lookup'
[GEM_ROOT]/gems/rack-cache-1.2/lib/rack/cache/context.rb:168:in `lookup'
[GEM_ROOT]/gems/rack-cache-1.2/lib/rack/cache/context.rb:66:in `call!'
[GEM_ROOT]/gems/rack-cache-1.2/lib/rack/cache/context.rb:51:in `call'
[GEM_ROOT]/gems/railties-3.2.12/lib/rails/engine.rb:479:in `call'
[GEM_ROOT]/gems/railties-3.2.12/lib/rails/application.rb:223:in `call'
[GEM_ROOT]/gems/railties-3.2.12/lib/rails/railtie/configurable.rb:30:in `method_missing'
/usr/local/lib/ruby/gems/1.9.1/gems/passenger-3.0.13/lib/phusion_passenger/rack/request_handler.rb:96:in `process_request'
/usr/local/lib/ruby/gems/1.9.1/gems/passenger-3.0.13/lib/phusion_passenger/abstract_request_handler.rb:516:in `accept_and_process_next_request'
/usr/local/lib/ruby/gems/1.9.1/gems/passenger-3.0.13/lib/phusion_passenger/abstract_request_handler.rb:274:in `main_loop'
/usr/local/lib/ruby/gems/1.9.1/gems/passenger-3.0.13/lib/phusion_passenger/rack/application_spawner.rb:206:in `start_request_handler'
/usr/local/lib/ruby/gems/1.9.1/gems/passenger-3.0.13/lib/phusion_passenger/rack/application_spawner.rb:171:in `block in handle_spawn_application'
/usr/local/lib/ruby/gems/1.9.1/gems/passenger-3.0.13/lib/phusion_passenger/utils.rb:479:in `safe_fork'
/usr/local/lib/ruby/gems/1.9.1/gems/passenger-3.0.13/lib/phusion_passenger/rack/application_spawner.rb:166:in `handle_spawn_application'
/usr/local/lib/ruby/gems/1.9.1/gems/passenger-3.0.13/lib/phusion_passenger/abstract_server.rb:357:in `server_main_loop'
/usr/local/lib/ruby/gems/1.9.1/gems/passenger-3.0.13/lib/phusion_passenger/abstract_server.rb:206:in `start_synchronously'
/usr/local/lib/ruby/gems/1.9.1/gems/passenger-3.0.13/lib/phusion_passenger/abstract_server.rb:180:in `start'
/usr/local/lib/ruby/gems/1.9.1/gems/passenger-3.0.13/lib/phusion_passenger/rack/application_spawner.rb:129:in `start'
/usr/local/lib/ruby/gems/1.9.1/gems/passenger-3.0.13/lib/phusion_passenger/spawn_manager.rb:253:in `block (2 levels) in spawn_rack_application'
/usr/local/lib/ruby/gems/1.9.1/gems/passenger-3.0.13/lib/phusion_passenger/abstract_server_collection.rb:132:in `lookup_or_add'
/usr/local/lib/ruby/gems/1.9.1/gems/passenger-3.0.13/lib/phusion_passenger/spawn_manager.rb:246:in `block in spawn_rack_application'
/usr/local/lib/ruby/gems/1.9.1/gems/passenger-3.0.13/lib/phusion_passenger/abstract_server_collection.rb:82:in `block in synchronize'
/usr/local/lib/ruby/gems/1.9.1/gems/passenger-3.0.13/lib/phusion_passenger/abstract_server_collection.rb:79:in `synchronize'
/usr/local/lib/ruby/gems/1.9.1/gems/passenger-3.0.13/lib/phusion_passenger/spawn_manager.rb:244:in `spawn_rack_application'
/usr/local/lib/ruby/gems/1.9.1/gems/passenger-3.0.13/lib/phusion_passenger/spawn_manager.rb:137:in `spawn_application'
/usr/local/lib/ruby/gems/1.9.1/gems/passenger-3.0.13/lib/phusion_passenger/spawn_manager.rb:275:in `handle_spawn_application'
/usr/local/lib/ruby/gems/1.9.1/gems/passenger-3.0.13/lib/phusion_passenger/abstract_server.rb:357:in `server_main_loop'
/usr/local/lib/ruby/gems/1.9.1/gems/passenger-3.0.13/lib/phusion_passenger/abstract_server.rb:206:in `start_synchronously'
/usr/local/lib/ruby/gems/1.9.1/gems/passenger-3.0.13/helper-scripts/passenger-spawn-server:99:in `<main>'
Community
  • 1
  • 1
Waynn Lue
  • 11,344
  • 8
  • 51
  • 76
  • Might be fixed in the latest rack version. Seems a few of the bugs, [including one I commented on](https://github.com/rack/rack/issues/225#issuecomment-2594611) when I had this issue 2 years ago, have been closed. Is blocking the bot an option? – Andrew Theis Jun 05 '13 at 19:04
  • 1
    Sadly I tried upgrading to the latest rack and still saw the problem (then downgraded again because I had to turn off some other gems). It's coming in from multiple IP addresses so it's turning into a game of whack-a-mole, plus I'm hoping there has to be a better way. :) – Waynn Lue Jun 06 '13 at 00:25
  • Can you share a backtrace? – Paul Schreiber Jun 07 '13 at 21:33
  • Ugh. This issue is prevalent on our apps as well. – Tom Rossi Jul 21 '14 at 16:39

2 Answers2

7

You could inject a middleware designed to detect these and fail gracefully. The basic idea is to just try to parse the query string, and if it fails, bail out with a HTTP 400. Otherwise, just allow the request through.

class RefuseInvalidRequest
  def initialize(app)
    @app = app
  end

  def call(env)
    query = Rack::Utils.parse_nested_query(env['QUERY_STRING'].to_s) rescue :bad_query
    if query == :bad_query
      [400, {'Content-Type' => 'text/plain'}, "Bad Request"]
    else
      @app.call(env)
    end
  end
end

I haven't tested this, but the concept should work.

Rob
  • 4,404
  • 2
  • 32
  • 33
Chris Heald
  • 61,439
  • 10
  • 123
  • 137
  • 1
    It not working on my dev environment. I dont know why but it only show `!! Invalid request` in the log. And tried to add `puts "called"` on `def initialize` it is not output `called` on log. only `!! Invalid request` . Yes i've adde `config.middleware.use("RefuseInvalidRequest")` to my application.rb . Have any idea why this happen ? – Yana Agun Siswanto Feb 24 '14 at 03:38
  • Perhaps you haven't injected the middleware high enough in your middleware stack. – Chris Heald Feb 24 '14 at 04:01
  • `config.middleware.insert_before Rack::Runtime, "RefuseInvalidRequest"`. The highest one i can insert. Actually it's `Rack::MiniProfiler` and `Honeybadger`. But i cant insert before it. Still the same output. Checkout my old question with logs http://stackoverflow.com/questions/21229499/custom-message-or-redirection-for-400-bad-request-because-of-unexcaped-on-par – Yana Agun Siswanto Feb 24 '14 at 06:59
  • 2
    I had the same issue, but with POST data (as in http://stackoverflow.com/questions/15769681/rails-rack-argumenterror-invalid-encoding-for-post-data), and I used this fix, but replaced the `query =` line with `request_content = Rack::Request.new(env).POST rescue :bad_form_data`, and it worked like a charm. Actually I also had to put the "Bad Request" response body inside an array for Ruby 1.9, so: `[400, headers, ["Bad Request"]]` – mltsy Jul 07 '14 at 16:23
7

if you don't mind against monkeypatching Rack then create in config/initializers file (for example rack.rb) with this content:

module Rack
  module Utils
    if defined?(::Encoding)
      def unescape(s, encoding = Encoding::UTF_8)
        begin
          URI.decode_www_form_component(s, encoding)
        rescue ArgumentError
          URI.decode_www_form_component(URI.encode(s), encoding)
        end
      end
    else
      def unescape(s, encoding = nil)
        begin
          URI.decode_www_form_component(s, encoding)
        rescue ArgumentError
          URI.decode_www_form_component(URI.encode(s), encoding)
        end
      end
    end
    module_function :unescape
  end
end

p.s. it works with passenger, but with Webrick and Thin it doesn't. It looks like both webrick and thin parse a request too, so the failure happens before initializer is loaded. For example with Thin error happens in thin-1.6.2/lib/thin/request.rb:84.

trushkevich
  • 2,657
  • 1
  • 28
  • 37