4

I'm talking to what seems to be a broken HTTP daemon and I need to make a GET request that includes a pipe | character in the URL.

LWP::UserAgent escapes the pipe character before the request is sent.

For example, a URL passed in as:

https://hostname/url/doSomethingScript?ss=1234&activities=Lec1|01

is passed to the HTTP daemon as

https://hostname/url/doSomethingScript?ss=1234&activities=Lec1%7C01

This is correct, but doesn't work with this broken server.

How can I override or bypass the encoding that LWP and its friends are doing?

Note

I've seen and tried other answers here on StackOverflow addressing similar problems. The difference here seems to be that those answers are dealing with POST requests where the formfield parts of the URL can be passed as an array of key/value pairs or as a 'Content' => $content parameter. Those approaches aren't working for me with an LWP request.

I've also tried constructing an HTTP::Request object and passing that to LWP, and passing the full URL direct to LWP->get(). No dice with either approach.


In response to Borodin's request, this is a sanitised version of the code I'm using

#!/usr/local/bin/perl -w
use HTTP::Cookies;
use LWP;

my $debug = 1;

# make a 'browser' object
my $browser = LWP::UserAgent->new();

# cookie handling...
$browser->cookie_jar(HTTP::Cookies->new(
             'file' => '.cookie_jar.txt',
             'autosave' => 1,
             'ignore_discard' => 1,
             ));

# proxy, so we can watch...
if ($debug == 1) {
    $browser->proxy(['http', 'ftp', 'https'], 'http://localhost:8080/');
}

# user agent string (pretend to be Firefox)
$agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.7.12) Gecko/20050919 Firefox/1.0.7';

# set the user agent
$browser->agent($agent);

# do some things here to log in to the web site, accept session cookies, etc. 
# These are basic POSTs of filled forms. Works fine.
# [...]

my $baseURL = 'https://hostname/url/doSomethingScript?ss=1234&activities=VALUEA|VALUEB';

@values = ['Lec1', '01', 'Lec1', '02'];

while (1) {
    if (scalar(@values) < 2) { last; }

    my $vala = shift(@values);
    my $valb = shift(@values);

    my $url = $basEURL;
    $url =~ s/VALUEA/$vala/g;
    $url =~ s/VALUEB/$valb/g;

    # simplified. Would usually check request for '200' response, etc...
    $content = $browser->get($url)->content();

    # do something here with the content

    # [...]

    # fails because the '|' character in the url is escaped after it's handed 
    # to LWP

}

# end
bchgys
  • 53
  • 5
  • 1
    As a point of interest, I've discovered the LWP::Curl module and its auto_encode() method. I've been able to demonstrate that if one creates an object thus: `$lwpcurl = LWP::Curl->new()`; and sets `$lwpcurl->auto_encode(0);`, subsequent requests with a URL of the form I've described in my question ARE passed to the HTTP daemon without further munging. Working with LWP::UserAgent is, to me, still preferable to LWP::Curl (I'd have to re-write a lot of code), so I'll let my question stand in the hope that someone might be able to help. – bchgys Feb 24 '14 at 01:46
  • The LWP library is very well designed and there are hooks at almost every point. Please show your code. Are you doing a simple `$ua->get` call? Have you tried building an `HTTP::Request` object and calling `$ua->request` on that? – Borodin Feb 24 '14 at 01:51
  • Borodin, thanks. Code above. As I said originally, I have tried to follow answers to other similar questions here on stackoverflow. HTTP::Request allegedly doesn't escape URLs if they're passed in ... for me, approaches such as `$req = HTTP::Request->new(GET=>'https://hostname/url/doSomethingScript'); $req->content('ss=1234&activities=VALUEA|VALUEB');` still gave me either a munged url in the HTTP request, or didn't send the 'Content' part with the GET request. I can see what needs to happen, I'm just a bit out of my depth as far as making it happens goes. – bchgys Feb 24 '14 at 02:50
  • 1
    It's done by the URI class used by HTTP::Request – ikegami Feb 24 '14 at 03:28
  • 2
    As a stackoverflow newb', I can't answer my own question. So, a comment: I've found a way to make this work, in a perlmonks thread http://computer-programming-forum.com/53-perl/11cfc4991e0b3d0a.htm in the post (#13) by Joe Schaefer. Essentially, create an `HTTP::Request` object, passing in the 'broken' url: my $url = 'https://hostname/url/doSomethingScript?ss=1234&activities=VALUEA|VALUEB'; my $request = HTTP:Request->new(GET => $url); `HTTP::Request` munges the URL: print $request->as_string; Demonstrates this. ... – bchgys Feb 24 '14 at 04:40
  • Going back and overriding the contents of the url with a simple regular expression, thus: ${$request->uri} =~ s/%7C/|/; Causes the brokenness to be re-imposed on the URL, as: print $request->as_string; demonstrates. Subsequently handing the request object to LWP: my $browser = LWP::UserAgent->new(); my $response = $browser->request($request) makes the request of the HTTP server with the broken url as expected. Joe also demonstrates a way to make this happen in a module (his aptly-named example: `HTTP::Request::Broken`)! – bchgys Feb 24 '14 at 04:41
  • Thanks all for your input and help. I will leave this open for comments and in the hope of other, possibly better, Ways To Do It. I'd also appreciate guidance on the One True Way (tm) to finalise stackoverflow question, etc. – bchgys Feb 24 '14 at 04:41
  • Once you have enough rep you can answer your own question. – simbabque Feb 24 '14 at 08:35
  • 1
    And having put in so much effort to leave an explanation for future readers, you are quite likely to generate enough rep for that by the question alone. Welcome to SO. – DeVadder Feb 24 '14 at 11:29
  • Missed the last few comments here before I posted my answer. Anyway, I hope I answered the "One True Way (tm)" part to the degree that it can be. – Lasse Feb 24 '14 at 14:59
  • Thanks for the welcome, DeVadder, it's nice to be able to contribute something. As to One True Way, @Lasse, I guess one could say There's More Than One True Way To Do It In Perl (tm), no? :-) Your answer is helpful, I saw a number of similar questions but no really spot-on answers until now, thank-you. – bchgys Feb 25 '14 at 01:21

1 Answers1

5

As @bchgys mentions in his comment, this is (almost) answered in the linked thread. Here are two solutions:

The first and arguably cleanest one is to locally override the escape map in URI::Escape to not modify the pipe character:

use URI;
use LWP::UserAgent;

my $ua = LWP::UserAgent->new();
my $res;
{
    # Violate RFC 2396 by forcing broken query string
    # local makes the override take effect only in the current code block
    local $URI::Escape::escapes{'|'} = '|';
    $res = $ua->get('http://server/script?q=a|b');
}
print $res->request->as_string, "\n";

Alternatively, you can simply undo the escaping by modifying the URI directly in the request after the request has been created:

use HTTP::Request;
use LWP::UserAgent;

my $ua = LWP::UserAgent->new();
my $req = HTTP::Request->new(GET => 'http://server/script?q=a|b');

# Violate RFC 2396 by forcing broken query string
${$req->uri} =~ s/%7C/|/; 

my $res = $ua->request($req);
print $res->request->as_string, "\n";

The first solution is almost certainly preferable because it at least relies on the %URI::Escape::escapes package variable which is exported and documented, so that's probably as close as you're gonna get to doing this with a supported API.

Note that in either case you are in violation of RFC 2396 but as mentioned you may have no choice when talking to a broken server that you have no control over.

Lasse
  • 686
  • 4
  • 9
  • thanks, @lasse, you've added the elusive local override for the escape map. I knew that could be done, but wasn't sure exactly how, so thanks. The second part of your answer is much the same as the answer I offered myself in comments, but I think yours is more eloquent, so I'll take that as the answer! I particularly like the ever-so-slightly snarky "Violate RFC2396" part! :-) – bchgys Feb 25 '14 at 01:18
  • Good point about the "elusive" local. I added a comment to clarify its purpose which a lot of people don't realize. – Lasse Feb 25 '14 at 20:55