2

how to simply open an url and read the data from a webpage with D? (I prefer phobos over tango, if needing to use standard lib functionality)

Samuel Lampa
  • 4,336
  • 5
  • 42
  • 63

2 Answers2

4

curl is in the standard library. You can fetch a url pretty easily like this:

import std.net.curl;
string content = get("d-lang.appspot.com/testUrl2");

http://dlang.org/phobos/std_net_curl.html#get

If you need to parse html, I wrote a dom library that is pretty good at it. https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff

grab dom.d and characterencodings.d then you can:

import arsd.dom;
auto document = new Document();
document.parseGarbage(content); // content is from above, the html string

writeln(document.title); // the <title> contents
auto paragraph = document.querySelector("p");
if(paragraph is null)
     writeln("no paragraphs in this document");
else
     writeln("the first paragraph is: ", paragraph.innerText);

and so on. If you've used javascript dom api, this is pretty similar (though expanded in a lot of ways too).

Adam D. Ruppe
  • 25,382
  • 4
  • 41
  • 60
  • Unfortunately, for the pure "std.net.curl" code example above, I run into the problem with the linker order generated by dmd, for linux, discussed here: http://forum.dlang.org/thread/mailman.1605.1334108859.4860.digitalmars-d@puremagic.com?page=2 and here: http://forum.dlang.org/thread/cwgxdvkvsnbwvbgrdivp@forum.dlang.org ... but obviously not yet fixed in 2.0.61 :( – Samuel Lampa Jan 01 '13 at 17:48
  • Ok, so I got around this by compiling with: "dmd [filename] -L-lphobos2 -L-lcurl". (Oh, and well, I needed to add a cast as well, that was not mentioned in the docs: "string s = cast(string) get("[url]");" ... since get returns a char[] rather than a string.) – Samuel Lampa Jan 01 '13 at 18:00
  • string s = get("[url]").idup;? – 0b1100110 Jan 01 '13 at 22:08
3

I think std.net.curl bindings are your best bet, specifically its get/post methods (example is in the docs): http://dlang.org/phobos/std_net_curl.html#get

After all, curl is designed specifically for this kind of tasks and bindings are part of phobos.

Mihails Strasuns
  • 3,783
  • 1
  • 18
  • 21
  • Ah, thx! (Though, unfortunately not part of the phobos version in Ubuntu/LinuxMint repos yet :/ ) – Samuel Lampa Jan 01 '13 at 15:52
  • if the phobos one isn't available to you, in my githib (link in the other answer here) you can grab my curl.d and use "string content = curl("http://example.com/foo.html");" If you don't have libcurl at all, I also have an http.d in there with a simple get() function. – Adam D. Ruppe Jan 01 '13 at 16:10
  • For some reason, I get "Error: function expected before (), not module curl of type void", with my little program: "import curl; string s = curl("example.org");", and placing the curl.d in the same folder as my other .d files ... Any hints? – Samuel Lampa Jan 01 '13 at 16:52
  • use "import arsd.curl;" rather than just import curl;. Also when compiling, put all the files on the dmd command line: "dmd yourfile.d curl.d" to avoid linker errors. – Adam D. Ruppe Jan 01 '13 at 17:06
  • Samuel, ah, than you are most likely using gdc from base repos, 4.4 one? It must have a rather old version of phobos shipped with and with current bug fixing tempo for D/phobos I'd really recommend to get latest releases straight from the devs unless there are specific requirements. – Mihails Strasuns Jan 01 '13 at 18:15