0

So, what I want to do, is make a script that will automatically add my login info(which i will have in my database) to whatever form I want.

To do this, I get the html source from the website(using cURL) then with DOMdocument i'm editing the input's username and password form name with my username and password values, then I'm outputing this, and click login

All should be alright, right? Yeah, in theory, but it isn't.

This is the code that does right that:

$dom = new DOMdocument();
$dom->formatOutput = true;
@$dom->loadHTML( mb_convert_encoding($html, 'HTML-ENTITIES', $encoding) );

$inputs = $dom->getElementsByTagName('input');
foreach ($inputs as $input)
{
    if ($input->getAttribute('name') == $id_nameValue)
    {
    $new_input = $dom->createElement('input');

    $new_input->setAttribute('name', $id_nameValue);
    $new_input->setAttribute('value', $id_value);

    $input->parentNode->replaceChild($new_input, $input);
    }

    if ($input->getAttribute('name') == $password_nameValue)
    {
    $new_input = $dom->createElement('input');

    $new_input->setAttribute('name', $password_nameValue);
    $new_input->setAttribute('value', $password_value);
    $new_input->setAttribute('type', 'password');

    $input->parentNode->replaceChild($new_input, $input);
    }
}

echo $dom->savehtml();

The problem I'm having, is with javascript not loading or css, or not redirecting correctly...

Lets take for example reddit: https://ssl.reddit.com/login They have this for the CSS

<link rel="stylesheet" href="/static/reddit.cYdhnJIJSZ0.css" type="text/css" />

instead of having https://ssl.reddit.com/login/static/reddit.cYdhnJIJSZ0.css, so I cant load it correctly, because it uses my url like

MY_URL.com/static/reddit.cYdhnJIJSZ0.css to find it...

The same applies to javascript, like

<script type="text/javascript" src="/static/jquery.js">

Or with

<form id="login_login" method="post" action="/post/login" class="user-form login-form">

this would redirect me to MY_URL.com/post/login

My question is how can I make this work? How can I edit the links to include the websites url? Since this is the first time i'm using DOMdocument, I don't know how would I go about editing the form, or script src...

So my end result would be

<link rel="stylesheet" href="https://ssl.reddit.com/login/static/reddit.cYdhnJIJSZ0.css" type="text/css" />
<script type="text/javascript" src="https://ssl.reddit.com/login/static/jquery.js">
<form id="login_login" method="post" action="https://ssl.reddit.com/login/post/login" class="user-form login-form">
alex2005
  • 31
  • 6
  • Am I paranoid or does this come across as suspicious? In any case, you should not be hot linking javascript, CSS, or images from other web sites to put on your own. Definitely you should not be setting up what looks like a phishing scam. – erisco Oct 19 '11 at 06:07
  • uh, lol? how is this a phishing scam? I want this for personal use, when im on a computer not my own, and want to store the usernames and passwords of my websites i want to login, so i dont get keylogged or other tricks to get your password... and its not like im linking the css or javascript for my personal use, its for the website its from... did you even read what i said i want it to do? – alex2005 Oct 19 '11 at 06:21
  • if you want to see it on your own, try it here http://www.auto-complete.info/ ---- username:user, password:password ---- just dont use a real password when adding a new page, its stored like is, in the db(for now)... – alex2005 Oct 19 '11 at 06:27
  • why dont you just use the browser's form completion? – Gordon Oct 19 '11 at 06:42
  • because when i'm at an internet cafe or a university computer, i cant really use that feature, cant i? I wish more people where trying to help me instead of questioning my little project... – alex2005 Oct 19 '11 at 06:54

1 Answers1

1

I think the easiest way to do this is to inject a base tag with an href attribute set to the base url of the last effective url (the url that was ultimately fetched by cURL in case of possible redirects). This last effective url can be retrieved with cURL by using:

$url = curl_getinfo( $ch, CURLINFO_EFFECTIVE_URL );

I've explained how to set the base tag with DOMDocument in this answer. It also accounts for situations where there is already a base tag. Although admittedly, my example doesn't look for the presence of a href attribute in the base tag yet. It should be trivial to add this check though by utilizing DOMElement::hasAttribute().

edit
In response to alex2005's comment:

You could alter it a bit, and do this:

$baseElement = $doc->createElement( 'base' );
$baseElement->setAttribute( 'href', $url );
$headElement = $doc->getElementsByTagName( 'head' )->item( 0 );

// it will automatically append, if $headElement has no firstChild (i.e. is null)
$headElement->insertBefore( $baseElement, $headElement->firstChild );

edit 2
A little warning though. I've overlooked something.

$url = curl_getinfo( $ch, CURLINFO_EFFECTIVE_URL );

... could effectively return an url like:

http://example.com/some/path/to/a/file.html

I'm not sure how browsers deal with filenames in a base tags. I'd assume they extract the directory path. But not sure about this.

But apart from that possible caveat, in most cases you probably only want to have the domain name of the last redirected url, to be used in the base tag.

At least this is true for resolving absolute uri's such as

/css/some.css
/js/some.js
/some/file.html

For resolving relative uri's such as:

css/some.css
js/some.js
some/file.html

... you'd probably want to extract the directory part of the url as well:

http://example.com/some/path/to/a/

So, after given it a little more thought, it's probably not so trivial to account for all possible scenario's. Be aware of this.

Community
  • 1
  • 1
Decent Dabbler
  • 22,532
  • 8
  • 74
  • 106
  • This actually is helpful, thanks. The problem I have with this, is the things before the base tag, wont have the base url in it, so it wont help for css and java scripts included before it. Anyway do make it at the very top? – alex2005 Oct 19 '11 at 08:43
  • @alex2005: I'm a bit surprised by this. Are you sure? I would think browsers would parse a possible `base` tag first before doing anything else... but OK, I've given a hint in my answer on how to account for this. HTH. – Decent Dabbler Oct 19 '11 at 08:57
  • @alex2005: I've simplified it a bit. It was rather verbose at first. – Decent Dabbler Oct 19 '11 at 09:01
  • God among men, thank you very much!!!! Been using html for some years now, this is the first time I heard of the tag – alex2005 Oct 19 '11 at 09:03
  • @alex2005: You're very welcome. Be sure to also read my additional edit as well, because there might be scenario's for which my trivial example won't result in the expected outcome. Admittedly using relative paths in website is not used that often anymore I believe, but you never know. – Decent Dabbler Oct 19 '11 at 09:11
  • I already know the url of the web site I am going to access, so I'm putting that into "$baseElement->setAttribute( 'href', $url );" and not using $url = curl_getinfo( $ch, CURLINFO_EFFECTIVE_URL ); So no worries there(I hope :D) – alex2005 Oct 19 '11 at 09:22