1

i need to upload a file to a customer site automatically, the site is protected by login credentials. Now I've a really big problem because the login page (and probably the rest of the site..) have a malformed HTML. How can i handle this pages? seem that casperJS not be able to handle the malformed HTML

Malformed HTML EXAMPLE (this is the site page cleaned up a bit but with original problems like tr or td not closed and so on..):

<html>
    <head>
        <title>TEST Login Page</title>
    </head>
<body>
    <div>
        <table>
            <tbody>
                <tr>
                    <td>
                        <table>
                            <tbody>
                                <tr>
                                    <td>
                                        <table>
                                            <tbody>
                                            <tr>
                                                <td>
                                                    <table> 
                                                        <tbody>                                                     
                                                        <form name="loginForm" method="post" action="test.do">
                                                            <tr>                                    
                                                                <input type="username" name="username" size="12" value=""></td>
                                                                <input type="password" name="password" size="12" value=""></td>
                                                            <input type="submit" value="Login" class="submit"></td>
                                                        </tr>
                                                        </form>
                                                    <tr>
                                </tr>                               
                            </tbody>
                        </table>
                    </td>
                    </tr>
                </tbody>
                </table>
                </td>
            </tr>
        </tbody>
        </table>
        </td>
    </tr>
</tbody>
</table>
</div>
</body>
</html>

CLEANED HTML

!DOCTYPE html>
<html>
<head>
  <title>TEST Login Page</title>
</head>
<body>
  <div>
    <table>
      <tbody>
        <tr>
          <td>
            <table>
              <tbody>
                <tr>
                  <td>
                    <table>
                      <tbody>
                        <tr>
                          <td>
                            <form name="loginForm" method="post" action="test.do" id="loginForm">
                                <input type="username" name="username" size="12" value="" />
                                <input type="password" name="password" size="12" value="" />
                                <input type="submit" value="Login" class="submit" />
                            </form>
                          </td>
                        </tr>
                      </tbody>
                    </table>
                  </td>
                </tr>
              </tbody>
            </table>
          </td>
        </tr>
      </tbody>
    </table>
  </div>
</body>
</html>

Casper JS Example:

casper.start(serverName, function(){ 
  this.echo(this.getHTML('form[name="loginForm"]'));
});

casper.run();

With malformed code, nothing return but with cleaned one everityng work fine!

there is a way to handle this problem?

Marco
  • 487
  • 2
  • 6
  • 25
  • Can you run the page code through an html tidying script before you pass it to casperJS? Some possibilities here: http://stackoverflow.com/questions/3913355/html-formatter-tidy-beautifier-for-javascript See also this question: http://stackoverflow.com/questions/21381549/testing-broken-html-with-casperjs – i alarmed alien Aug 12 '14 at 09:06
  • Thank you for your reply, I've already tried this way but i can't be able to tidying the HTML before passing it to the casperJS engine, there is an example ? i can't find anything googling :( – Marco Aug 12 '14 at 09:21
  • To clarify, did you try the method in the second link? – i alarmed alien Aug 12 '14 at 09:32
  • OMG!! sorry second example work exactly as expected!! Thank you very much !thank you – Marco Aug 12 '14 at 09:51

1 Answers1

0

If the HTML is malformed then it is undefined how PhantomJS will parse and handle it. PhantomJS breaks the page completely (sample):

<table>
    <tbody>
        <tr>
            <td>
                <table>
                    <tbody>
                    <tr>
                        <td>
                            <input type="username" name="username" size="12" value=""><input type="password" name="password" size="12" value=""><input type="submit" value="Login" class="submit"><table> 
                                <tbody>                                                     
                                <form name="loginForm" method="post" action="test.do"></form>
                                    <tr>                                    



                                </tr>

                            <tr>
        </tr>                               
    </tbody>
</table>

It may still be salvageable by

  1. downloading the page in question with __utils__.sendAJAX,
  2. fixing it first with plain javascript and string operations (this is the tricky part) and
  3. then assign this fixed string to casper.page.content.

This will essentially be an about:blank page with your markup. So you will need to start CasperJS with the --local-to-remote-url-access=true flag.


If you are not bound to PhantomJS you may try out http://slimerjs.org/ as the engine for CasperJS. It uses the gecko engine of the installed Firefox which might handle broken HTML better. It can be run in headless mode through xvfb.

Artjom B.
  • 61,146
  • 24
  • 125
  • 222