6

I want to transform a webpage into leet (1337) speak with XPath and PHP.

It can be done with only PHP but then the HTML nodes are also replaced with leet speak.

Example ($html is the webpage):

$find = array("a","b","c","d","e","f","g","h","i","j"."k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z");
$repl = array("4","b","c","d","3","f","g","h","1","j","k","1","m","n","0","p","9","r","5","7","u","v","w","x","y","2");
$html = str_replace($find, $repl, $html);

That replaces also the HTML nodes.

Can this be done with XPath and PHP with the XPath selector text()? Example ($html is the webpage):

$dom = new DOMDocument();
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);
$xpath->query('//text()');
\\HERE THE REPLACING IN XPATH
j0k
  • 22,600
  • 28
  • 79
  • 90

1 Answers1

3

Try this:

$dom = new DOMDocument;
$dom->loadHTML( $html );
$xpath = new DOMXPath( $dom );
$nodes = $xpath->query( '//text()' );
foreach( $nodes as $node )
{
    $node->nodeValue = str_replace( $find, $repl, $node->nodeValue );
}
echo $dom->saveHTML();

Note that this is probably a more useful xpath query for your needs:

$nodes = $xpath->query( '//head/title/text() | //body//text()' );

... as this will only replace text in <head><title> or text being descendants of <body>. Probably wouldn't want to replace possible styles, Javascript and what have you. ;-)


On a side note: I've tested this with your arrays of find and replace characters, but there's something fishy going on with them, that I can't figure out. The replacement characters don't seem to line up with the found characters all the time. I have no idea why that is.

I've recreated the arrays, and these work for me:

$find = array('a','b','c','d','e','f','g','h','i','j'.'k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z');
$repl = array('4','b','c','d','3','f','g','h','1','j'.'k','1','m','n','0','p','9','r','5','7','u','v','w','x','y','2');

I just can't figure out why your arrays are not working for me. :-/ Perhaps an encoding issue? If anybody want to chime in and venture a guess, please do.

Edit: As rxdazn noticed, "j"."k" was the problem in the first array, which I totally overlooked as you can see from my recreated arrays (I copied $find over to $repl, replaced quotes, and filled in the leet characters).

Decent Dabbler
  • 22,532
  • 8
  • 74
  • 106