1

Basically the site I am trying to scrape text from is secure and only accessible to those on what i can assume is the VPN the organization has set.

When I was testing my tool today, while connected to the network at this place, on a computer that can access this site, it was not allowing the tool to access it. I am wondering if maybe there's something im missing that someone can tell me.

Below I have attached my entire source code. For context and a better understanding, the fields for "Demo Results" are just to test on any site. Currently i have it so that i can paste whatever xPath and URL i please in a textbox. This will be changed later and set to a specific URL and xPath. I am using HTMLAgilityPack as you might see. This works when i use it on any site that is accessible to the public. But when it comes to this specific site, I get an error that the object is not set to an instance of an object. The site works 100% fine in the web browser.

namespace ToolConcept
{
    public partial class Form1 : Form
    {
        public string Results1;
        public string Results2;
        public string DemoResults;


        public Form1()
        {
            InitializeComponent();


            DemoResults = "";
            Results1 = "YOUR RESULTS HERE";
            Results2 = "RESULTS SHOWN HERE!";
        }
        public void Scrape(string args)
        {
            HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
            HtmlAgilityPack.HtmlDocument doc = web.Load(urlField.Text);
            foreach(var item in doc.DocumentNode.SelectNodes(pathField.Text))
            {
                DemoResults = item.InnerText;
            }

        }
        private void button1_Click(object sender, EventArgs e)
        {
            if (textBox1.Text == "R1")
            {
                textBox2.Text = Results1;
            }
            else if (textBox1.Text == "R2")
            {
                textBox2.Text = Results1 + Results2;
            }
            else if (textBox1.Text == "0")
            {
                Scrape(DemoResults);
                HtmlAgilityPack.HtmlWeb search = new HtmlAgilityPack.HtmlWeb();

                textBox2.Text = DemoResults;
            }
            if(textBox1.Text == null)
            {
                MessageBox.Show("Not Found!");
            }
        }

        private void button2_Click(object sender, EventArgs e)
        {
            textBox2.Clear();
        }
    }
}
quaabaam
  • 1,808
  • 1
  • 7
  • 15
Decodorant
  • 11
  • 2
  • 1
    Does this answer your question? [HtmlAgilityPack and Authentication](https://stackoverflow.com/questions/23298532/htmlagilitypack-and-authentication) – quaabaam Mar 03 '22 at 01:40
  • @quaabaam, Unfortunately no, as I do not have anywhere that would require a login to enter credentials. – Decodorant Mar 03 '22 at 01:51
  • Is the webserver performing windows authentication transparently? (https://learn.microsoft.com/en-us/dotnet/api/system.net.http.httpclienthandler.usedefaultcredentials?view=net-6.0) – Jeremy Lakeman Mar 03 '22 at 02:07
  • @JeremyLakeman I will attempt this. I do not have access to this computer at the moment to test but tomorrow I will. So, I will try that. Really hoping this works. If not, I will update with another comment. Thank you for the link either way! – Decodorant Mar 03 '22 at 02:26
  • @JeremyLakeman I have tested this and it did not work. I am thinking it has to do with SSL certs maybe? Any idea on how I can put the proper SSL cert in my scraper? – Decodorant Mar 03 '22 at 15:56
  • To add to what @JeremyLakeman was getting at, even though you don't have anywhere to enter credentials, it doesn't necessarily mean you aren't implicitly logging in when you connect to their network. You may want to try using the same credentials you use to connect to their network as the credentials in your application. To test this easily, you could open the web browser in incognito mode and see if you are prompted for login credentials from the web browser. – quaabaam Mar 03 '22 at 17:41
  • @quaabaam Ah, okay. I have tried that and I do not get a prompt. When we are not on the local network, the site will simply not connect. It's as if I was attempting to go to a URL that doesn't have a web page up. I do think that this is something browser related. – Decodorant Mar 03 '22 at 17:46
  • Update: To be more clear, I think I have narrowed this down to being an issue with certificates. The computer has the proper certificate, as well as the browser, but I just need to figure out how to get the program to somehow get access to that cert as well. – Decodorant Mar 04 '22 at 02:14
  • I have found I'm one step closer to finding the solution. I have made a toggle for switching between TLS, TLS11, & TLS12 security. When using TLS & TLS11, I now get an error stating that it's unable to establish an SSL connection. I think now I'll only need to figure out what credentials I need to connect properly. Thanks for everyones direction so far. If anyone knows how I go about including the cert credentials needed, I am still looking for the help! – Decodorant Mar 04 '22 at 16:45

0 Answers0