-1

I am new in c# programming. I am trying to scrape data from div (I want to display temperature from web page in Forms application). This is my code:

private void btnOnet_Click(object sender, EventArgs e)
{
    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    HtmlWeb web = new HtmlWeb();
    doc = web.Load("https://pogoda.onet.pl/");
    var temperatura = doc.DocumentNode.SelectSingleNode("/html/body/div[1]/div[3]/div/section/div/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]");
    onet.Text = temperatura.InnerText;
}

This is the exception:

System.NullReferenceException: temperatura was null.

Palle Due
  • 5,929
  • 4
  • 17
  • 32
  • Maybe try searching by css class? Something like doc.DocumentElement.SelectNodes("//div[@class='class on div']"). This should be easier, but if this is not possible, double-check the div hierarchy. Also check how to search using linq [Parsing html with the HTML Agility Pack and Linq](https://stackoverflow.com/questions/4616790/parsing-html-with-the-html-agility-pack-and-linq) – Герман Матисов Apr 22 '22 at 11:10
  • you can select a single node with doc.DocumentNode.SelectSingleNode("[@class='temp']").InnerText but also make sure that what is coming back in that response is what you expect, remember some of this stuff is javascript driven on some websites. – Netferret Apr 22 '22 at 14:11

1 Answers1

0

You can use this:

public static bool TryGetTemperature(HtmlAgilityPack.HtmlDocument doc, out int temperature)
{
    temperature = 0;

    var temp = doc.DocumentNode.SelectSingleNode(
        "//div[contains(@class, 'temperature')]/div[contains(@class, 'temp')]");
    if (temp == null)
    {
        return false;
    }

    var text = temp.InnerText.EndsWith("°") ?
        temp.InnerText.Substring(0, temp.InnerText.Length - 5) : 
        temp.InnerText;

    return int.TryParse(text, out temperature);
}

If you use XPath, you can select with more precission your target. With your query, a bit change in the HTML structure, your application will fail. Some points:

  • // is to search in any place of document
  • You search any div that contains a class "temperature" and, inside that node:
  • you search a div child with "temp" class
  • If you get that node (!= null), you try to convert the degrees (removing '°' before)

And check:

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
HtmlWeb web = new HtmlWeb();
doc = web.Load("https://pogoda.onet.pl/");
if (TryGetTemperature(doc, out int temperature))
{
   onet.Text = temperature.ToString();
}

UPDATE

I updated a bit the TryGetTemperature because the degrees are encoded. The main problem is the HTML. When you request the source code you get some HTML that browser update later dynamically. So the HTML that you get is not valid for you. It doesn't contains the temperature.

So, I see two alternatives:

  • You can use a browser control (in Common Controls -> WebBrowser, in the Form Tools with the Button, Label...), insert into your form and Navigate to the page. It's not difficult, but you need learn some things: wait to events for page downloaded and then get source code from the control. Also, I suppose you'll want to hide the browser control. Be carefully, sometimes the browser doesn't works correctly if you hide. In that case, you can use a visible Form outside desktop and manage activate events to avoid activate this window. Also, hide from Task Window (Alt+Tab). Things become harder in this way but sometimes is the only way.
  • The simple way is search the location that you want (ex: Madryt) and look in DevTools the request done (ex: https://pogoda.onet.pl/prognoza-pogody/madryt-396099). Use this Url and you get a valid HTML.
Victor
  • 2,313
  • 2
  • 5
  • 13
  • Thanks, did you do this in windows Forms? I want to use this method after click button and display result in TextBox. – Weronika Przymuszała Apr 22 '22 at 12:00
  • Yes, I updated my answer to set your onet TextBox value. – Victor Apr 22 '22 at 13:49
  • Thanks. Anyway i don't see temperature in text box. I don't know what I am doing wrong :( Can I sen you my code priv?? – Weronika Przymuszała Apr 22 '22 at 15:05
  • Yes, no prob. Are you debugged that lines? Are you sure that TryGetTemperature is returning true? My test with web.Load failed. I navigateg and got the source code to make the XPath. Maybe that the problem – Victor Apr 22 '22 at 15:17
  • how can I contact with you?? I really need help with this project :(( – Weronika Przymuszała Apr 22 '22 at 15:22
  • I don't know if it's possible share contact info here. You can zip your project and share with any free service, like wetransfer.com o post your source code. I'm almost sure that if you add and else statement with a MessageBox.Show("Error") you'll see that message – Victor Apr 22 '22 at 15:49
  • Yes, exactly, i see this message. This is my project https : // we. tl / t-vxvEwz2tWe (I can't send all URL) – Weronika Przymuszała Apr 22 '22 at 15:59