2

I am trying to download an xml file from a website with c#, but I get an 404 on some urls. this is wired because they still work in the browser. Other urls still work without a problem.

HttpWebRequest request = (HttpWebRequest)
            WebRequest.Create(url);
        request.Method = "GET";
        request.Timeout = 3000;
        request.UserAgent = "Test Client";
        HttpWebResponse response = null;
            try
            {
                response = (HttpWebResponse)
                    request.GetResponse();
            }
            catch (WebException e)
            {
                response = (HttpWebResponse)e.Response;
            }
            Console.WriteLine("- "+response.StatusCode);

        XmlTextReader reader = XmlTextReader(response.GetResponseStream());

This URL is one of the said problem URLs:

http://numerique.bibliotheque.toulouse.fr/cgi-bin/oaiserver?verb=ListMetadataFormats

SOLVED....forgot to trim the url ;)

tom
  • 21
  • 1
  • 3
  • Some servers verify User Agents. Try using a real one. – Evan Mulawski Nov 15 '10 at 20:23
  • The server might be looking at the User-Agent header or other details about the request. – driis Nov 15 '10 at 20:25
  • @Tom, your code works fine for me. Have you got other problem URIs? – acoolaum Nov 15 '10 at 20:40
  • @acoolaum, yes, there are several ones with this problem, @evan, it doesn't work with a real useragen either – tom Nov 15 '10 at 20:48
  • here is another one, btw, it worked when i copied the first line ivo posted below, is there anything special with the @ in front of the string? http://diglit.ub.uni-heidelberg.de/cgi-bindigioai.cgi?verb=ListMetadataFormats – tom Nov 15 '10 at 20:59

4 Answers4

3

I can only speculate that the host site might not like your UserAgent and is returning a 404 message

Mike Park
  • 10,845
  • 2
  • 34
  • 50
2

I solved this problem by using this:

var client = (HttpWebRequest)WebRequest.Create(uri);
client.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
client.CookieContainer = new CookieContainer();
client.UserAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36";
var htmlCodae = client.GetResponse() as HttpWebResponse;
Seth
  • 1,545
  • 1
  • 16
  • 30
Paulos02
  • 163
  • 1
  • 4
1

For downloading xml document you can use DownloadString method:

System.Net.WebClient client = new System.Net.WebClient();

String url = "http://stackoverflow.com/feeds/question/4188449";

String xmlSource = client.DownloadString(url);

Console.WriteLine(xmlSource);
Edward83
  • 6,664
  • 14
  • 74
  • 102
0

Maybe

1) Somehow you input incorrect url: can you try to put

   WebRequest.Create(@"http://numerique.bibliotheque.toulouse.fr/cgi-bin/oaiserver?verb=ListMetadataFormats");

instead of

  WebRequest.Create(url);

for testing purpose.

2) You have some HTTP filtering mechanism which distinguishes between VS & browser requrests

ivo s
  • 104
  • 2