0

I am curious if there is a way to see if a wikipedia page exists, I have a custom search implemented, that replaces the spaces in a search with _, however I have no way to see if this path actually exists.

    targetWiki = inputCustomTarget.text;
    targetWiki = [targetWiki stringByReplacingOccurrencesOfString:@" " withString:@"_"];
    targetWiki = [NSString stringWithFormat:@"http://en.m.wikipedia.org/wiki/%@", targetWiki];    

Would I have to parse the response in order to find out if a page exists?

Alex Muller
  • 1,565
  • 4
  • 23
  • 42

3 Answers3

2

There should be no need to parse the result just check for a 200 response code in the - (void)connection:(NSURLConnection*)connection didReceiveResponse:(NSURLResponse*)response callback. If it does not exist you should get a 404.

Edit:

I would like to add that pages that do not exist on the Main Wikipedia page (not the mobile .m) do return the correct 404 error code. This could change in the future and may not be completely reliable if they change anything but neither is parsing the content. Here is a sample I put together to prove this.

NSURLRequest *exists = [NSURLRequest requestWithURL:[NSURL URLWithString:@"http://en.wikipedia.org/wiki/Qwerty"]];
//Redirects to Blivet
NSURLRequest *redirects = [NSURLRequest requestWithURL:[NSURL URLWithString:@"http://en.wikipedia.org/wiki/Poiuyt"]];
NSURLRequest *nonexistant = [NSURLRequest requestWithURL:[NSURL URLWithString:@"http://en.wikipedia.org/wiki/Jklfdsa"]];

NSHTTPURLResponse *resp_exists;
NSHTTPURLResponse *resp_redirects;
NSHTTPURLResponse *resp_nonexistant;

[NSURLConnection sendSynchronousRequest:exists returningResponse:&resp_exists error:NULL];
[NSURLConnection sendSynchronousRequest:redirects returningResponse:&resp_redirects error:NULL];
[NSURLConnection sendSynchronousRequest:nonexistant returningResponse:&resp_nonexistant error:NULL];

NSLog(@"\nExists: %d\nRedirects: %d\nNon Existant: %d", 
      [resp_exists statusCode], 
      [resp_redirects statusCode], 
      [resp_nonexistant statusCode] );

And here is the output

Exists: 200
Redirects: 200
Non Existant: 404

So if a page exists or automatically redirects to a page that does exist you will get a 200 error code, if it does not exist then you will get 404. If you would like to capture the redirect you will need implement -connection:willSendRequest:redirectResponse: and act accordingly.

Note: This example code is synchronous for the sake of being compact. This is not ideal and production implementations should be sending asynchronous request and use the NSURLConectionDelegate methods.

Community
  • 1
  • 1
Joe
  • 56,979
  • 9
  • 128
  • 135
  • This won't work for wikipedia because pages that don't exist are returned with status code 200 because you can then go on to create them so the -1 is for flawed advice, sorry. – Roger Aug 01 '11 at 22:22
  • I wasn't the person to -1 this, and I'm not 100% sure if I'm right, but as Wiki will return it's custom 404 page, would this not return a 200 response code? – ingh.am Aug 01 '11 at 22:23
  • @Roger Did you test yourself? Because I tested it in an HTTP Client with and w/o following redirects and I would get a 404. – Joe Aug 01 '11 at 22:23
  • Then it should work then! Sorry to drop that in, I know some sites don't handle this correctly that's all. – ingh.am Aug 01 '11 at 22:25
  • @ing0 Yes I know that is why tested it first if I am wrong I am wrong, but I was getting the expected 404 code. And I did test it before posting my answer :) – Joe Aug 01 '11 at 22:27
  • @Joe I checked using wfetch before I answered and I get a 200 OK status coming back when I append a page that does not exist. Apologies if this is not always the case then. – Roger Aug 01 '11 at 22:29
  • @Roger As I stated before I tested this before posting my answer, now I have went and painstakingly downloaded the XCode for Lion and wrote a test application and I have the same results from previous testing. Please do not downvote before testing the solution yourself :) But you forced me to take my lazy answer and make it better, and for that I thank you. – Joe Aug 01 '11 at 23:08
  • @Joe. Your test uses en.wikipedia.org, the OP asked about en.m.wikipedia.org, perhaps that is why we get different results. Either that, or our locales means we get a different response as wikipedia is quite well distributed. I absolutely get a 200 response for the query that you get a 404 for - and that's using YOUR code or wfetch. I've doubled checked now - the problem is indeed that if you test the .m (mobile) domain, you get a 200 OK whereas when you test the primary non mobile site, you get a 404. The question was about the .m though ... so I stand by my answer. – Roger Aug 02 '11 at 00:21
1

You can't check the response code because it will always return a 200 response code.

I think the best way to see if a page exists is to parse the response and check if you land on the default 'search results' page.

Another option would be to make use of MediaWiki's API.

http://en.wikipedia.org/w/api.php?action=opensearch&search=term

Check if the term that was searched for exists in the returned response.

Mark
  • 2,714
  • 1
  • 14
  • 17
0

Yes, I'm afraid you will probably need to parse the results to know if the page exists. However there might be an alternative if you look at the complete English wikipedia dump files which are made available here;

http://en.wikipedia.org/wiki/Wikipedia:Database_download#Latest_complete_dump_of_English_Wikipedia

Obviously this raw data is huge, but you could write a parser to find all the valid links and then compress that information into (say) a coreData database which you might find could fit on the iPhone. Then you could run a check without having to test the page.

But to be honest, I'd probably parse the page and perhaps cache the answer so I only have to do it once.

EDIT: I'm afraid the answer given by Joe is not fully correct. When I use the domain that the original question used (ie en.m.wikipedia.org) then Joe's sample code gives the following output.

Exists: 200
Redirects: 200
Non Existant: 200

If I use en.wikipedia.org then my results concur with Joe, however that was not the question asked. I am based in the UK and that might also have a bearing on the results.

Roger
  • 15,793
  • 4
  • 51
  • 73