0
  1. Using cefsharp I loaded a page that gives 25 links in a page.
  2. using FrameLoadEnd got HTML content into HtmlAgilityPack Document.
  3. Got title from nodes for 25 links. Problem When i click for 50 links on page and try to get titles, it still gives me 25 links. which is old page. i could not figureout why FrameLoadEnd is not able to change the html when navigated to another link within page.

Screen shot when page is loading 25 titles

When I click 50 titles i cannot get the html content for 50 titles

here is code

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using CefSharp;
using CefSharp.WinForms;
using System.Windows.Forms;
using HtmlAgilityPack;

namespace Hummingbird_HAP_E_Scraper
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            chromiumWebBrowser1.Load("https://www.sciencedirect.com/search?qs=nursing");
        }
        string html;
        private void ChromiumWebBrowser1_FrameLoadEnd(object sender, CefSharp.FrameLoadEndEventArgs e)
        {
            BeginInvoke((Action)(async () =>
            {
                html = await chromiumWebBrowser1.GetSourceAsync();
            }));
        }
       HtmlAgilityPack.HtmlDocument xdoc = new HtmlAgilityPack.HtmlDocument();
        private void Button1_Click(object sender, EventArgs e)
        {
            xdoc.LoadHtml(html);
            System.Threading.Thread.Sleep(5000);
          
            HtmlNodeCollection links;
            links = xdoc.DocumentNode.SelectNodes("//h2/span/a");
                    if (links == null)
                        return;
                    foreach (HtmlNode link in links)
                    {
                         listBox1.Items.Add(link.InnerText);
   
                    }
            }
    }
}
Anup Raj
  • 34
  • 5
  • You'll either need to query the browser directly to get the inner text or get the page source every time you click the button. Your HtmlDocument loads from a string, it's never updated. The assumption you perform an action in the browser and the HtmlDocument is automatically updated is incorrect. – amaitland May 17 '21 at 20:15
  • Thank you for your update. Still i face the problem that viewsource opens the source in a temporary notepad. moreover, i do not have any idea about querying the browser directly using xpath. – Anup Raj May 19 '21 at 11:10
  • ViewSource() opens the source in the default text editor by design. You can query the DOM via JavaScript see https://github.com/cefsharp/CefSharp/wiki/General-Usage#2-how-do-you-call-a-javascript-method-that-returns-a-result – amaitland May 19 '21 at 19:04
  • Don;t know if i could get html on my button click even. still i give it a shot... thanks for the help, pal – Anup Raj May 20 '21 at 05:40
  • Yes you can. https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelectorAll with a css selector. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/forEach to select just the innerhtml – amaitland May 20 '21 at 06:17

0 Answers0