20

I'm using httpc:request to post some data to a remote service. I have the post working but the data in the body() of the post comes through as is, without any URL-encoding which causes the post to fail when parsed by the remote service.

Is there a function in Erlang that is similar to CGI.escape in Ruby for this purpose?

2240
  • 1,547
  • 2
  • 12
  • 30
davidsmalley
  • 1,029
  • 3
  • 10
  • 15

8 Answers8

24

I encountered the lack of this feature in the HTTP modules as well.

It turns out that this functionality is actually available in the erlang distribution, you just gotta look hard enough.

> edoc_lib:escape_uri("luca+more@here.com").
"luca%2bmore%40here.com"

This behaves like CGI.escape in Ruby, there is also URI.escape which behaves slightly differently:

> CGI.escape("luca+more@here.com")
 => "luca%2Bmore%40here.com" 
> URI.escape("luca+more@here.com")
 => "luca+more@here.com" 

edoc_lib

Luca Spiller
  • 2,248
  • 4
  • 24
  • 28
  • 2
    Seems like this advice is stale -- follow the link and you'll see `edoc_lib:escape_uri` is MIA. Not sure what erlang release introduced this change. – jtmoulia Mar 31 '15 at 23:34
  • [It's there, and exported.](https://github.com/erlang/otp/blob/172e812c491680fbb175f56f7604d4098cdc9de4/lib/edoc/src/edoc_lib.erl#L382-L414) I'm not sure why it's not in the man page. – mqsoh May 30 '15 at 02:59
  • Worked for me, seems fine. – skwidbreth Jun 01 '18 at 16:19
10

At least in R15 there is http_uri:encode/1 which does the job. I would also not recommend using edoc_lib:escape_uri as its translating an '=' to a %3d instead of a %3D which caused me some trouble.

John-Paul Bader
  • 321
  • 3
  • 6
  • `http_uri:encode` works well for ASCII characters, but doesn't do anything to characters greater than 127. (Of course, it's hard to say what it should do, given that the function doesn't know the desired output encoding.) – legoscia Dec 30 '14 at 12:27
  • "... but doesn't do anything to characters greater than 127." or control characters. :( – juan.facorro Oct 24 '17 at 15:39
9

You can find here the YAWS url_encode and url_decode routines

They are fairly straightforward, although comments indicate the encode is not 100% complete for all punctuation characters.

Bwooce
  • 2,123
  • 19
  • 28
7

Here's a simple function that does the job. It's designed to work directly with inets httpc.

%% @doc A function to URL encode form data.
%% @spec url_encode(formdata()).

-spec(url_encode(formdata()) -> string()).
url_encode(Data) ->
    url_encode(Data,"").

url_encode([],Acc) ->
    Acc;

url_encode([{Key,Value}|R],"") ->
    url_encode(R, edoc_lib:escape_uri(Key) ++ "=" ++ edoc_lib:escape_uri(Value));
url_encode([{Key,Value}|R],Acc) ->
    url_encode(R, Acc ++ "&" ++ edoc_lib:escape_uri(Key) ++ "=" ++ edoc_lib:escape_uri(Value)).

Example usage:

httpc:request(post, {"http://localhost:3000/foo", [], 
                    "application/x-www-form-urlencoded",
                    url_encode([{"username", "bob"}, {"password", "123456"}])}
             ,[],[]).
Rody Oldenhuis
  • 37,726
  • 7
  • 50
  • 96
Rick Moynihan
  • 481
  • 3
  • 10
6

If someone need encode uri that works with utf-8 in erlang:

https://gist.github.com/3796470

Ex.

Eshell V5.9.1  (abort with ^G)

1> c(encode_uri_rfc3986).
{ok,encode_uri_rfc3986}

2> encode_uri_rfc3986:encode("テスト").
"%e3%83%86%e3%82%b9%e3%83%88"

3> edoc_lib:escape_uri("テスト").
"%c3%86%c2%b9%c3%88" # output wrong: ƹÈ
Renato Albano
  • 101
  • 1
  • 4
4

To answer my own question...I found this lib in ibrowse!

http://www.erlware.org/lib/5.6.3/ibrowse-1.4/ibrowse_lib.html#url_encode-1

url_encode/1

url_encode(Str) -> UrlEncodedStr

Str = string()
UrlEncodedStr = string()

URL-encodes a string based on RFC 1738. Returns a flat list.

I guess I can use this to do the encoding and still use http:

davidsmalley
  • 1,029
  • 3
  • 10
  • 15
1

Here's a "fork" of the edoc_lib:escape_uri function that improves on the UTF-8 support and also supports binaries.

escape_uri(S) when is_list(S) ->
    escape_uri(unicode:characters_to_binary(S));
escape_uri(<<C:8, Cs/binary>>) when C >= $a, C =< $z ->
    [C] ++ escape_uri(Cs);
escape_uri(<<C:8, Cs/binary>>) when C >= $A, C =< $Z ->
    [C] ++ escape_uri(Cs);
escape_uri(<<C:8, Cs/binary>>) when C >= $0, C =< $9 ->
    [C] ++ escape_uri(Cs);
escape_uri(<<C:8, Cs/binary>>) when C == $. ->
    [C] ++ escape_uri(Cs);
escape_uri(<<C:8, Cs/binary>>) when C == $- ->
    [C] ++ escape_uri(Cs);
escape_uri(<<C:8, Cs/binary>>) when C == $_ ->
    [C] ++ escape_uri(Cs);
escape_uri(<<C:8, Cs/binary>>) ->
    escape_byte(C) ++ escape_uri(Cs);
escape_uri(<<>>) ->
    "".

escape_byte(C) ->
    "%" ++ hex_octet(C).

hex_octet(N) when N =< 9 ->
    [$0 + N];
hex_octet(N) when N > 15 ->
    hex_octet(N bsr 4) ++ hex_octet(N band 15);
hex_octet(N) ->
    [N - 10 + $a].

Note that, because of the use of unicode:characters_to_binary it'll only work in R13 or newer.

Example usage is:

9> httpc:request("http://httpbin.org/get?q=" ++ mylib_app:escape_uri("☺")).
{ok,{{"HTTP/1.1",200,"OK"},
     [{"connection","keep-alive"},
      {"date","Sat, 09 Nov 2019 21:51:54 GMT"},
      {"server","nginx"},
      {"content-length","178"},
      {"content-type","application/json"},
      {"access-control-allow-credentials","true"},
      {"access-control-allow-origin","*"},
      {"referrer-policy","no-referrer-when-downgrade"},
      {"x-content-type-options","nosniff"},
      {"x-frame-options","DENY"},
      {"x-xss-protection","1; mode=block"}],
     "{\n  \"args\": {\n    \"q\": \"\\u263a\"\n  }, \n  \"headers\": {\n    \"Host\": \"httpbin.org\"\n  }, \n  \"origin\": \"11.111.111.111, 11.111.111.111\", \n  \"url\": \"https://httpbin.org/get?q=\\u263a\"\n}\n"}}

We send out a request with escaped query parameter and see that we get back the correct Unicode codepoint.

2240
  • 1,547
  • 2
  • 12
  • 30
gdamjan
  • 998
  • 9
  • 12
0

AFAIK there's no URL encoder in the standard libraries. Think I 'borrowed' the following code from YAWS or maybe one of the other Erlang web servers:

% Utility function to convert a 'form' of name-value pairs into a URL encoded
% content string.

urlencode(Form) ->
    RevPairs = lists:foldl(fun({K,V},Acc) -> [[quote_plus(K),$=,quote_plus(V)] | Acc] end, [],Form),
    lists:flatten(revjoin(RevPairs,$&,[])).

quote_plus(Atom) when is_atom(Atom) ->
    quote_plus(atom_to_list(Atom));

quote_plus(Int) when is_integer(Int) ->
    quote_plus(integer_to_list(Int));

quote_plus(String) ->
    quote_plus(String, []).

quote_plus([], Acc) ->
    lists:reverse(Acc);

quote_plus([C | Rest], Acc) when ?QS_SAFE(C) ->
    quote_plus(Rest, [C | Acc]);

quote_plus([$\s | Rest], Acc) ->
    quote_plus(Rest, [$+ | Acc]);

quote_plus([C | Rest], Acc) ->
    <<Hi:4, Lo:4>> = <<C>>,
    quote_plus(Rest, [hexdigit(Lo), hexdigit(Hi), ?PERCENT | Acc]).

revjoin([], _Separator, Acc) ->
    Acc;

revjoin([S | Rest],Separator,[]) ->
    revjoin(Rest,Separator,[S]);

revjoin([S | Rest],Separator,Acc) ->
    revjoin(Rest,Separator,[S,Separator | Acc]).

hexdigit(C) when C < 10 -> $0 + C;
hexdigit(C) when C < 16 -> $A + (C - 10).
tonys
  • 3,855
  • 33
  • 39