0

I use IE 10 F12 button to locate a < a > node on my page, I got this:

<a tabindex="-1" class="level1 static" href="About.aspx">About</a>

But I use the following code to retrieve the page HTML, and get this:

<a class="level1" href="About.aspx">About</a>

Code:

        WebClient wc = new WebClient();
        String pageString = wc.DownloadString(url);

Why are they different?

Update:

Below is the Fiddler monitor result.

IE10:

enter image description here

enter image description here

WebClient:

enter image description here

enter image description here

smwikipedia
  • 61,609
  • 92
  • 309
  • 482

2 Answers2

2

It's typical for webservers to send different output depending on which browser the request is coming from. Perhaps this "simplified" <a> tag is a result of that?

I'm not sure how WebClient works but perhaps it's possible to modify headers so you can act like you're an IE10 browser and see if the results are different.

rliu
  • 1,148
  • 6
  • 8
  • If so, is there any way for me to simulate the IE to get the "full" < a > tag? – smwikipedia Oct 10 '12 at 16:33
  • Right, I know for a fact that browser information is stored somewhere in the Http request which you can find the details of. I looked briefly and `WebClient` has a `Headers` dictionary, which may or may not be what you want. Honestly, what I'd do at this point is use Fiddler to get a sample Http request from IE10 and then see if you could do something similar in `WebClient`. The more educational way might be to find the specification for requests. – rliu Oct 10 '12 at 16:37
  • Could you update your original post with your detailed results? It could also be possible that when you load the page in a browser it comes with javascript that modifies the Html. – rliu Oct 10 '12 at 17:24
  • I have updated my post. Actually, there're some WebResource.axd handling. But I don't know what it does. – smwikipedia Oct 10 '12 at 17:33
  • Your User-Agent is different in the two images... but I think its actually likely that javascript is modifying the html. This is mostly out of my realm of knowledge, but you might want to look for a more general solution to your problem. The issue (as far as I can tell) is that for more complicated website engines the browser has to do extra steps to generate the final Html. Furthermore, that Html might be modified dynamically through javascript or other means. – rliu Oct 10 '12 at 18:11
  • I changed the User-Agent to the same but still not working. Thanks for your reply. I will continue investigate. Maybe I need to take a look at mshtml module of IE. – smwikipedia Oct 11 '12 at 00:25
0

This question is duplicated with this one: How to get the page source from an IE window?

And I have solved it there.

smwikipedia
  • 61,609
  • 92
  • 309
  • 482