19

I need to be able to break a URL down into different segments. Take this path for example:

http://login:password@somehost.somedomain.com:8080/some_path/something_else.html?param1=val&param2=val#nose
\__/   \___/ \______/ \_____________________/ \__/\____________________________/ \___________________/ \__/
 |       |      |               |               |                |                        |              |
Scheme Username Password       Host            Port             Path                    Query         Fragment

This should break down as follows:

Protocol: HTTP
Username: login
Password: password
Host: somehost.somedomain.com
Port: 8080
Path Info: /some_path/something_else.html
Query String: param1=val&param2=val

How can I do this in Delphi? Is there something ready made which can split this up for me? If not, how do I go about parsing all the different possible formats? This is assuming that it might even be a different protocol, such as HTTPS or RTSP.

Ian Boyd
  • 246,734
  • 253
  • 869
  • 1,219
Jerry Dodge
  • 26,858
  • 31
  • 155
  • 327
  • I hope for all our sakes the password is not in clear text. – eggy May 22 '13 at 23:14
  • @eggy technically it is, this is how some systems authenticate. It's up to the server implementation whether to require it to be encrypted or not. – Jerry Dodge May 22 '13 at 23:15
  • @eggy To add, such web servers I've noticed have actually been hardware API's such as switches / routers, IP surveillance cameras, VOIP phones, etc. – Jerry Dodge Aug 28 '15 at 00:00

2 Answers2

39

XE2 ships with Indy, which has a TIdURI class for that purpose, eg:

uses
  ..., IdURI;

var
  URI: TIdURI;

URI := TIdURI.Create('http://login:password@somehost.somedomain.com:8080/some_path/something_else.html?param1=val&param2=val');
try
  // Protocol = URI.Protocol
  // Username = URI.Username
  // Password = URI.Password
  // Host = URI.Host
  // Port = URI.Port
  // Path = URI.Path
  // Query = URI.Params
finally
  URI.Free;
end;
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • +1 Even better, it's almost always an advantage when something is already encapsulated and implemented in a language :) – Jerry Dodge May 22 '13 at 23:34
  • 2
    Indy is not part of the Delphi language. It is just a pre-bundled third-party library. But at least the URI parsing has no external dependencies as it is implemented in plain Delphi code. – Remy Lebeau May 22 '13 at 23:39
  • Well I meant is available in the IDE. Indy of course isn't part of Delphi, that's why it's Indy :) I'll probably wind up using this solution anyway, for ease of use. – Jerry Dodge May 22 '13 at 23:44
  • Wait, so if this Indy method doesn't depend on anything and parses it all by its self, then I think this may be the appropriate solution. Accepted this one instead since it technically requires only the presence of Indy, which comes shipped with almost all versions of Delphi. – Jerry Dodge May 22 '13 at 23:52
23

You can use the InternetCrackUrl method.

Try this simple

{$APPTYPE CONSOLE}

uses
  Windows,
  SysUtils,
  WinInet;

procedure ParseURL(const lpszUrl: string);
var
  lpszScheme      : array[0..INTERNET_MAX_SCHEME_LENGTH - 1] of Char;
  lpszHostName    : array[0..INTERNET_MAX_HOST_NAME_LENGTH - 1] of Char;
  lpszUserName    : array[0..INTERNET_MAX_USER_NAME_LENGTH - 1] of Char;
  lpszPassword    : array[0..INTERNET_MAX_PASSWORD_LENGTH - 1] of Char;
  lpszUrlPath     : array[0..INTERNET_MAX_PATH_LENGTH - 1] of Char;
  lpszExtraInfo   : array[0..1024 - 1] of Char;
  lpUrlComponents : TURLComponents;
begin
  ZeroMemory(@lpszScheme, SizeOf(lpszScheme));
  ZeroMemory(@lpszHostName, SizeOf(lpszHostName));
  ZeroMemory(@lpszUserName, SizeOf(lpszUserName));
  ZeroMemory(@lpszPassword, SizeOf(lpszPassword));
  ZeroMemory(@lpszUrlPath, SizeOf(lpszUrlPath));
  ZeroMemory(@lpszExtraInfo, SizeOf(lpszExtraInfo));
  ZeroMemory(@lpUrlComponents, SizeOf(TURLComponents));

  lpUrlComponents.dwStructSize      := SizeOf(TURLComponents);
  lpUrlComponents.lpszScheme        := lpszScheme;
  lpUrlComponents.dwSchemeLength    := SizeOf(lpszScheme);
  lpUrlComponents.lpszHostName      := lpszHostName;
  lpUrlComponents.dwHostNameLength  := SizeOf(lpszHostName);
  lpUrlComponents.lpszUserName      := lpszUserName;
  lpUrlComponents.dwUserNameLength  := SizeOf(lpszUserName);
  lpUrlComponents.lpszPassword      := lpszPassword;
  lpUrlComponents.dwPasswordLength  := SizeOf(lpszPassword);
  lpUrlComponents.lpszUrlPath       := lpszUrlPath;
  lpUrlComponents.dwUrlPathLength   := SizeOf(lpszUrlPath);
  lpUrlComponents.lpszExtraInfo     := lpszExtraInfo;
  lpUrlComponents.dwExtraInfoLength := SizeOf(lpszExtraInfo);

  InternetCrackUrl(PChar(lpszUrl), Length(lpszUrl), ICU_DECODE or ICU_ESCAPE, lpUrlComponents);

  Writeln(Format('Protocol : %s',[lpszScheme]));
  Writeln(Format('Host     : %s',[lpszHostName]));
  Writeln(Format('User     : %s',[lpszUserName]));
  Writeln(Format('Password : %s',[lpszPassword]));
  Writeln(Format('Path     : %s',[lpszUrlPath]));
  Writeln(Format('ExtraInfo: %s',[lpszExtraInfo]));
end;

begin
  try
   ParseURL('http://login:password@somehost.somedomain.com/some_path/something_else.html?param1=val&param2=val');
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
  readln;
end.

This will return

Protocol : http
Host     : somehost.somedomain.com
User     : login
Password : password
Path     : /some_path/something_else.html
ExtraInfo: ?param1=val&param2=val
RRUZ
  • 134,889
  • 20
  • 356
  • 483
  • +1 Awesomeness, I edited my question a few minutes ago and added port number to the sample. – Jerry Dodge May 22 '13 at 23:30
  • Accepted since this is the more direct way without dependencies. – Jerry Dodge May 22 '13 at 23:36
  • 3
    @JerryDodge: this solution has dependancies: Windows 2000+ and WinInet. – Remy Lebeau May 22 '13 at 23:37
  • 1
    @RemyLebeau Rephrase: too many library dependencies :) Not hatin' on Indy but I'm guessing this is what Indy uses behind the scenes anyway (I always prefer lower-level ways of doing things, although I don't always understand how it works, which becomes the advantage of using pre-made libraries) – Jerry Dodge May 22 '13 at 23:38
  • 5
    `TIdURI` does not use any external APIs for its parsing. – Remy Lebeau May 22 '13 at 23:58
  • @Remy Yup, that's why I accepted your answer instead a few minutes ago – Jerry Dodge May 23 '13 at 00:00
  • 3
    OT: out of curiosity, `InternetCrackUrl` cannot handle colons in scheme component (e.g. a real example `jdbc:jtds:sqlserver://localhost/cabinet`) for some reason (the call succeeds, but the returned components are wrong). That doesn't involve HTTP scheme of course, but it's worth knowing that WinInet failed in that case (Indy handles that case correctly). [already voted] – TLama Aug 18 '15 at 09:55
  • Looking back at this, deciding to use Indy was a very good idea because now I'm using Firemonkey and copying old code into the new projects - and WinInet only works on Windows. – Jerry Dodge Aug 27 '15 at 23:58
  • Would it ever, ever be acceptable to write all that code just to parse a URL? – alcalde Oct 18 '15 at 01:06