2

I have been tasked with finding a solution to quite a novel issue. I have a variety of httpclient calls that I have to make in order to authenticate against a 3rd party vendor. However, part of this process involves dynmaically generated values being created in javascript and passed to a form, which is then posted to the 3rd party. As I'm using the httpclient class, I cannot obviously generate/run the javascript and thus the process comes to a halt right here (the posting of these values creates an important authentication cookie for an intermediate step).

So, I'd like to be able to take this simple html, which contains a form and some javascript, and have my c# code evaluate this and then retrieve the values that javscript has assigned to the form. I'd then use these values and continue with the workflow processes.

I could take a clunky route and use the webbrowser control. However, as this is being used in a non visual environment, I'd like to be able pass the html string into some sort of emulator and receive the parsed html back as a return. Below is an example of the simple html that I'd be dealing with:

<html>
<head>
    <script type="text/javascript">
        function testLoad() {
            document.forms[0].elements[0].value = "some guid id plus the date:" + Date.getDate + 'some random js value';
            document.forms[0].elements[1].value = decodeURIComponent(document.forms[0].elements[1].value);
            document.forms[0].elements[2].value = decodeURIComponent(document.forms[0].elements[2].value);
            // optionally submit -or just get the returned form values and post from htmlclient
            document.forms[0].submit();
        }</script>
    <noscript>Please enable JavaScript to view the page content.</noscript>
</head>
<body onload="testLoad()">
    <form method="POST" action="/" />
        <input type="hidden" name="test_id" value="idstuff" />
        <input type="hidden" name="test_123" value="encoded value" />
        <input type="hidden" name="test_another" value="1.01" />
    </form>
</body>
</html>

Once the html has been returned from the emulated process, I'd then use HtmlAgilityPack to grab the form values that have been populated by the javascript function (testLoad()) and progress to the next steps.

Am I aiming too high here, or has this bridge been crossed a few times. I've looked at http://wiki.awesomium.com, csExWB, jint and a few others, but none seem to take the really simple approach that I'm hoping for here. Think of my quest as being able to use the initial html as a parameter and have the emlulator return the patched html.

Hope the above is clear - I am wishing to evaluate the html/js from a serverside process and then move onto the next process within my c# workflow!.

[edit] - this looks VERY promising: http://www.tomdupont.net/2013/08/phantomjs-headless-browser-for-net-webdriver.html. I've taken the tips here and am using PhantomJs with Selenium... so far, so good!!

[oh and just to point out, this is not for any sinister use, the 3rd party in question just doesn't yet have a b2b api in place to permit the interop that we require between us]

jim tollan
  • 22,305
  • 4
  • 49
  • 63
  • How complex is the Javascript? Is that an accurate example you've provided? Because you could just rewrite that in C# and use a simple HTTP post using the values gained via the HtmlAgilityPack – CodingIntrigue May 22 '14 at 13:34
  • hi there, unfortunately, this is of course a simplified version of the javascript. the *real* javascript runs some meaty validations, ^'s as well as calling some core js functions against looped data – jim tollan May 22 '14 at 13:37
  • In that case, [IronJS](https://github.com/fholm/IronJS) – CodingIntrigue May 22 '14 at 13:38
  • i did see that earlier when looking, however, it appears to only avaluate ecma script and not the full web stakc (i.e. html plus script in single unit)...thanks tho – jim tollan May 22 '14 at 13:41
  • It really does sound like you want the functionality of the WebBrowser control, without using WebBrowser control. Why do you feel it would not fit the requirement? – CodingIntrigue May 22 '14 at 13:43
  • 1
    it doesn't fit the requirement mainly becuase I will be baking this into our b2b api (which I will replace once the 3rd party has changed their platform). our side of the api will live on an azure server and thus the process needs to be encapsulated (*preferably*) inside a non visual process that has a low startup and memory footprint. that said, don't think i haven't tried with old mr wb!! ;) – jim tollan May 22 '14 at 13:50

3 Answers3

3

AngleSharp also contains a short demo (project) that connects Jint (a JavaScript interpreter, completely written in .NET) to it. Both are PCL projects and they work together without problems. That should provide everything that is usually used in JavaScript / the DOM.

A very simple example looks like:

static void SimpleScriptingSample()
{
    //We require a custom configuration
    var config = new Configuration();

    //Including a script engine
    config.Register(new JavaScriptEngine());

    //And enabling scripting
    config.IsScripting = true;

    //This is our sample source, we will set the title and write on the document
    var source = @"<!doctype html>
        <html>
        <head><title>Sample</title></head>
        <body>
        <script>
        document.title = 'Simple manipulation...';
        document.write('<span class=greeting>Hello World!</span>');
        </script>
        </body>";
    var document = DocumentBuilder.Html(source, config);

    //Modified HTML will be output
    Console.WriteLine(document.DocumentElement.OuterHtml);
}

This will print the (serialized) DOM, which already contains the modifications (such as a new title and the inserted span element).

Florian Rappl
  • 3,041
  • 19
  • 25
  • 1
    thanks for this florian, i'll take a look over that as it looks like a nice simple implementation - plus i imagine it can mix 'n match with htmagilitypack, `post` documentbuilder – jim tollan Aug 13 '14 at 08:07
  • Thanks for your great work @Florian. But it seems your whole documentation on AngleSharp is outdated. can you please update the documentation? for example I couldn't figure out to register javascript engine. the config.register() gets requester as argument. There is no code example... – Manoochehr Dadashi Apr 19 '15 at 23:52
  • It is true, a lot of things just recently. I'll update the documentation as soon as I can (hopefully still this week). In the mean time I can recommend you the samples at https://github.com/AngleSharp/AngleSharp.Samples -- they work with the current version and also showcase JavaScript integration. – Florian Rappl Apr 20 '15 at 12:46
  • What's the replacement for DocumentBuilder? – Gavin Williams Oct 05 '20 at 23:33
  • 1
    Hi @GavinWilliams use `BrowsingContext`. It has methods like `OpenAsync`. Ideally, you can just stream a source. If you have an HTML `string` then use the response builder pattern like `context.OpenAsync(res => res.Content(myHtmlString))` – Florian Rappl Oct 06 '20 at 00:44
  • 1
    @Florian Rappl Thank you for your help on here and also on GitHub recently. You've been a great help with usage of your library! – Gavin Williams Oct 10 '20 at 05:55
1

There's PhantomJS, which can be scripted via JavaScript and run as an external process from C#:

adv12
  • 8,443
  • 2
  • 24
  • 48
  • +1 looks promising. do you have any 1st hand experience with this?? Altho i love link juice, nothing whacks me more than 1st hand use and anecdotal evidence of it's efficacy... as said tho - looks like a contender – jim tollan May 22 '14 at 13:39
  • @jimtollan: no firsthand experience here, sorry – adv12 May 22 '14 at 13:40
  • no worries, i will of course try this out. i'll then drop an update. however, a range of other answers/comments will hopefully narrow my quest... – jim tollan May 22 '14 at 13:42
  • hey, i updated the original post to point to a setup page for .net and phantomjs -works really well!! – jim tollan May 22 '14 at 15:13
1

It sounds like you'll need a headless browser to execute the html/javascript. Take a look here.

I would prefer AngleSharp over HtmlAgilityPack though.

Community
  • 1
  • 1
Shelakel
  • 1,070
  • 9
  • 16