C# HttpWebRequest shows 404, but site is reachable in browser

Question

I am trying to download an xml file from a website with c#, but I get an 404 on some urls. this is wired because they still work in the browser. Other urls still work without a problem.

HttpWebRequest request = (HttpWebRequest)
            WebRequest.Create(url);
        request.Method = "GET";
        request.Timeout = 3000;
        request.UserAgent = "Test Client";
        HttpWebResponse response = null;
            try
            {
                response = (HttpWebResponse)
                    request.GetResponse();
            }
            catch (WebException e)
            {
                response = (HttpWebResponse)e.Response;
            }
            Console.WriteLine("- "+response.StatusCode);

        XmlTextReader reader = XmlTextReader(response.GetResponseStream());

This URL is one of the said problem URLs:

http://numerique.bibliotheque.toulouse.fr/cgi-bin/oaiserver?verb=ListMetadataFormats

SOLVED....forgot to trim the url ;)

The server might be looking at the User-Agent header or other details about the request. — driis, Nov 15 '10 at 20:25
@Tom, your code works fine for me. Have you got other problem URIs? — acoolaum, Nov 15 '10 at 20:40
@acoolaum, yes, there are several ones with this problem, @evan, it doesn't work with a real useragen either — tom, Nov 15 '10 at 20:48
here is another one, btw, it worked when i copied the first line ivo posted below, is there anything special with the @ in front of the string? http://diglit.ub.uni-heidelberg.de/cgi-bindigioai.cgi?verb=ListMetadataFormats — tom, Nov 15 '10 at 20:59

score 3 · Answer 1 · answered Nov 15 '10 at 20:24

3

I can only speculate that the host site might not like your UserAgent and is returning a 404 message

answered Nov 15 '10 at 20:24

Mike Park

10,845
2
34
50

score 2 · Answer 2 · edited Aug 01 '16 at 11:24

I solved this problem by using this:

var client = (HttpWebRequest)WebRequest.Create(uri);
client.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
client.CookieContainer = new CookieContainer();
client.UserAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36";
var htmlCodae = client.GetResponse() as HttpWebResponse;

score 1 · Answer 3 · answered Nov 15 '10 at 21:56

For downloading xml document you can use DownloadString method:

System.Net.WebClient client = new System.Net.WebClient();

String url = "http://stackoverflow.com/feeds/question/4188449";

String xmlSource = client.DownloadString(url);

Console.WriteLine(xmlSource);

score 0 · Answer 4 · answered Nov 15 '10 at 20:35

Maybe

1) Somehow you input incorrect url: can you try to put

   WebRequest.Create(@"http://numerique.bibliotheque.toulouse.fr/cgi-bin/oaiserver?verb=ListMetadataFormats");

instead of

  WebRequest.Create(url);

for testing purpose.

2) You have some HTTP filtering mechanism which distinguishes between VS & browser requrests

C# HttpWebRequest shows 404, but site is reachable in browser

4 Answers4

Linked