I am trying to query a website to scrape some information for my organization, this information is sat behind a login page which for now I am bypassing by logging into the browser using my organization credentials and this website stores the details in the cookies so in any subsequent visits am still logged in (I know this is a hit and miss solution but for my purposes it's fine. In the event am logged out I will just manually log back in through a browser session).
Within this site there are two sections I need to access:
/Memberships
In order to retrieve a list of URL's
/Organisation?orgid=XXXXXX
And individual organizational pages which are retrieved from the /Memberships page
Problem
Now for some strange reason during the call to /Memberships the HTML data retrieved is perfectly fine and I am able to get a list of all the child URL's.
string url = "https://www.ACME.com/Memberships";
var response = CallUrl(url).Result;
private static async Task<string> CallUrl(string fullUrl)
{
HttpClient client = new HttpClient();
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls13;
client.DefaultRequestHeaders.Accept.Clear();
var response = client.GetStringAsync(fullUrl);
return await response;
}
When I proceed to attempt to query any of the child URL's I don't get the HTML response I am expecting which would be the organization details. Instead am presented with the website login page (well the HTML from the login page).
The code used is pretty much the same as above but if we swap out the url variable for:
string url = "https://www.ACME.com/Organisation?orgid=XXXX";
Keep in mind in order to access both the /Memberships page and the individual /Organisation?orgid=XXXXXX pages one must be logged in.
So what's stumping me is why can I access /Memberships but not the other pages!?