0

This is my code:

var html = webBrowser1.DocumentText;

            HtmlWeb web = new HtmlWeb();

            var htmlDoc = new HtmlAgilityPack.HtmlDocument();
            htmlDoc.LoadHtml(html);

            var node = htmlDoc.DocumentNode.SelectSingleNode("/html/body/div/div/div/div/section/section/div/div/div/div").Attributes["class"].Value;


            Console.WriteLine("Node Name: " + node);

So far everything works fine, but if I add a "/ div" to "SelectSingleNode" then it won't work (error message: "Exception thrown:" System.NullReferenceException ""), although there is another "div" in the HTML code there.

I think it is because in the HTML code before the next "div" there is a ":: before", but only if i analyze it in the browser

A part of the HTML code:

 <div class="un-page__body">
    <div class="container-fluid">
       ::before
    <div class="row">
       ::before
       <div class="col-sm-6">
Game Lion
  • 61
  • 1
  • 7
  • Welcome to SO! Please review and edit your question with a [Minimal Reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) – Jawad Jan 18 '20 at 18:39
  • Any reason why you have to go down the path of using the entire xpath? Have you considered using // (to search within doc) like `//div[@class='className']` – Jawad Jan 18 '20 at 18:40
  • @Jawad yes //div[@class='className'] doesnt work. I think the website does have a mechanicm to block this – Game Lion Jan 18 '20 at 19:31
  • I get an answer for all "div" that are before the "div" with the class "row" and for the "div" with the class "row", but as soon as I after the "div" with the class "col-sm -6 "search and all" div "after that I get an error – Game Lion Jan 18 '20 at 19:48
  • Can you provide the html you are scraping or url – Jawad Jan 18 '20 at 19:50
  • the url is: https://mese.webuntis.com/WebUntis/index.do?school=Gutenberg-schule-berlin#/basic/main And i just want to login into the login field automaticaly can you help me? – Game Lion Jan 18 '20 at 19:53
  • Does this answer your question? [login to website using HTMLAgilityPack](https://stackoverflow.com/questions/13568933/login-to-website-using-htmlagilitypack) – Jawad Jan 18 '20 at 21:20
  • No this does't answer the question. Can you please try to reach the div after the ":: before" ("//div[@class= 'col-sm-6']") and tell me how it worked. You've got the URL already. – Game Lion Jan 19 '20 at 16:11
  • Running scripts in Html Agility Pack: [Cant be done at this time unfortunately](https://stackoverflow.com/a/11394830/1390548) – Jawad Jan 19 '20 at 18:29

1 Answers1

0

When you are looking at the HTML using F12 / Dev Tools, HTML you see is very different from what you see in HtmlAgilityPack or any other web scraping tool.

Reason

Your code doesnt work and wont work because there are only two div tags in the entire document. /html/body/div will work because there are two of these, and thats it. Rest is just js scripts.

When you load the URL in chrome, chrome compiles the data, executes the scripts and then present the data that it rendered to show you what you are supposed to see.

The URL you provided only has scripts in its body that execute and generate the divs you are seeing in the Dev / Tools and at this time, HTML Agility Pack is NOT able to execute the scripts and render a compiled HTML for you to scrape through.

What you get in HTMLAgilityPack

When you look at the code in the doc.DocumentNode, you only see this

<div id="app">
    WebUntis wird geladen ...
</div>

Chrome / IE will load something else because thats after compilation / rendering. What you are looking to do is to run the scripts in HTMLAgilityPack.. which is not something that you can do at this time.

What you see in Chrome / Browser

<div id="app">
    <div style="height: 100%;">
        <div class="un-app">
            <nav class="un-app-header navbar navbar-default">
                <div class="container-fluid">
...
Jawad
  • 11,028
  • 3
  • 24
  • 37
  • Thank you so much for your answer. Do you know another way to login on this page? – Game Lion Jan 19 '20 at 19:15
  • Unless that site offers an api to sign in with, I am not sure it would be possible – Jawad Jan 19 '20 at 19:34
  • @GameLion, does this answer your question on why you are not able to see the data? Please do mark the post as answered if it did – Jawad Jan 24 '20 at 02:07