For those whom are in the same situation as me:
It turns out that it wasn't that easy to store appropriate caching for PhantomJS and selenium, so I turned to an alternative route which ended up working.
When PhantomJS accesses your website that is locked behind a JS wall, (such as CloudFlare DDOS Protection), it will most likely store a cookie with an auth token of sorts saying that your browser passes the test.
At first, it didn't work for me, because it seems CloudFlare also logs which User Agent has auth'd for that token, and any mismatch will discard the token used.
I managed to solve this using the following piece of code:
private Image GetImage(string ImageLocation)
{
byte[] data = null;
using (CustomWebClient WC = new CustomWebClient())
{
WC.Headers.Add(System.Net.HttpRequestHeader.UserAgent, "Mozilla/5.0 (iPhone; CPU iPhone OS 10_0_1 like Mac OS X) AppleWebKit/601.1 (KHTML, like Gecko) CriOS/53.0.2785.109 Mobile/14A403 Safari/601.1.46");
WC.Headers.Add(System.Net.HttpRequestHeader.Cookie, "cf_clearance=" + PhantomObject.Manage().Cookies.GetCookieNamed("cf_clearance").Value);
data = WC.DownloadData(ImageLocation);
}
Bitmap MP = new Bitmap(new System.IO.MemoryStream(data));
data = null;
return MP;
}
In this code, PhantomObject
is my PhantomJS driver object, and CustomWebClient
is just a normal website with a bit of adjusting for the website I was using.
I essentially use the same faked user agent that my PhantomJS driver was using, as well as passed over in the headers the CloudFlare clearance cookie, and from there my webclient was able to successfully access the websites data and download the image's data, which I then turned into a bitmap and returned back.