How to screen scrape HTTPS using C#?
-
I am using Firefox add-on 'Temper Data'.It shows all posted data. – Jignesh Dec 05 '09 at 17:16
5 Answers
You can use System.Net.WebClient to start an HTTPS connection, and pull down the page to scrape with that.

- 5,297
- 5
- 32
- 62
-
-
You'll need to make sure you assign a CookieContainer in the WebClient for cookies to be passed across multiple requests (eg. the login page and then the content page). – Danny Tuppeny Dec 04 '09 at 15:38
-
-
If you're talking about server side URL rewriting, no idea. But if you're talking about javascript, simply parse it in code. – Brett Allen Dec 04 '09 at 16:24
You can use System.Net.WebClient to grab web pages. Here is an example: http://www.codersource.net/csharp_screen_scraping.html

- 2,986
- 4
- 22
- 25
-
2link dead: i think this may be the updated link - http://www.codersource.net/microsoft-net/c-advanced/html-screen-scraping-in-c.aspx – Simon_Weaver Oct 20 '10 at 22:03
If for some reason you're having trouble with accessing the page as a web-client or you want to make it seem like the request is from a browser, you could use the web-browser control in an app, load the page in it and use the source of the loaded content from the web-browser control.

- 13,505
- 11
- 64
- 87
Here's a concrete (albeit trivial) example. You can pass a ship name to VesselFinder in the querystring, but even if it only finds one ship with that name it still shows you the search results screen with one ship. This example detects that case and takes the user straight to the tracking map for the ship.
string strName = "SAFMARINE MAFADI";
string strURL = "https://www.vesselfinder.com/vessels?name=" + HttpUtility.UrlEncode(strName);
string strReturnURL = strURL;
string strToSearch = "/?imo=";
string strPage = string.Empty;
byte[] aReqtHTML;
WebClient objWebClient = new WebClient();
objWebClient.Headers.Add("User-Agent: Other"); //You must do this or HTTPS won't work
aReqtHTML = objWebClient.DownloadData(strURL); //Do the name search
UTF8Encoding utf8 = new UTF8Encoding();
strPage = utf8.GetString(aReqtHTML); // get the string from the bytes
if (strPage.IndexOf(strToSearch) != strPage.LastIndexOf(strToSearch))
{
//more than one instance found, so leave return URL as name search
}
else if (strPage.Contains(strToSearch) == true)
{
//find the ship's IMO
strPage = strPage.Substring(strPage.IndexOf(strToSearch)); //cut off the stuff before
strPage = strPage.Substring(0, strPage.IndexOf("\"")); //cut off the stuff after
}
strReturnURL = "https://www.vesselfinder.com" + strPage;

- 6,649
- 1
- 50
- 52