0
-module(wikipedia).
-export([main/0]).
-define(Url, "http://en.wikipedia.org/w/api.php?format=xml&action=parse&prop=sections&page=Chicago").
-define(Match, "^[A-Za-z]+[A-Za-z0-9]*$").

main() ->
    inets:start(),
    %% Start ssl application
  ssl:start(),
    {ok, {_Status, _Header, Body}} = httpc:request(?Url),
    T = re:run(Body, ?Match, [{capture, all_but_first, binary}]),
    io:format("~s~n",[T]).

I want to store the content of the wikipedia page in "T" using the reqular expression Match. And then I was going to fetch the title. But this above code says nomatch. I am not getting how to fetch the title of a wikipedia page using erlang. Please help.(I am new to erlang). [I want something like :https://stackoverflow.com/questions/13459598/how-to-get-titles-from-a-wikipedia-page]

hithard
  • 123
  • 2
  • 8
  • What line has the `nomatch` error? Can you include the stacktrace in your question? – Stratus3D Jul 29 '17 at 19:33
  • Also, that page is xml, so I'd recommend using http://erlang.org/doc/apps/xmerl/xmerl_ug.html to parse the XML and extract the content you want. – Stratus3D Jul 29 '17 at 19:35
  • The output is showing no match.@Stratus3D – hithard Jul 30 '17 at 03:07
  • Ah ok, so the `io:format/2` call is printing `nomatch`, which means that is the value of `T`. Which means the `re:run/3` call didn't find anything matching your regex. – Stratus3D Jul 31 '17 at 00:58
  • 3
    That would make sense, since your regex doesn't allow for anything besides letters and numbers, but the XML is going to contain many other characters. What is that regex suppose to be doing? – Stratus3D Jul 31 '17 at 00:59
  • My aim was to fetch "title" and "summary". I was testing the code if it can fetch anything or not(that is why that regex). Can you help me with this? It will be helpful. @Stratus3D – hithard Aug 01 '17 at 07:20
  • If your wanting to see if the command fetched anything you do not need the regex. All the XML should be returned if you remove the `re:run/3` call and just print the body instead. – Stratus3D Aug 01 '17 at 13:56

1 Answers1

2

First, I think the title is already in your URL: "Chicago", if that the case just pattern match the URL to Obtain the title. If not that the case I suggest that you should use an XML parsing module like xmlerl:

-module(parse_title).
-include_lib("xmerl/include/xmerl.hrl").

-export([main/0]).

main() ->
  inets:start(),
  ssl:start(),
  U =  "http://en.wikipedia.org/w/api.php?format=xml&action=parse&prop=sections&page=Chicago",
  {ok, {_, _, Body}} = httpc:request(U),
  {Xml,_} = xmerl_scan:string(Body),
  [Title|_] = [Value || #xmlAttribute{value = Value} <- xmerl_xpath:string("//api/parse/@title", Xml)],
  Title.
codeadict
  • 2,643
  • 1
  • 15
  • 11