0

I am trying to create a web application that will convert any selected webpage into a form of simple english. I have a word for word translation stored in a My_SQL database. I have this code so far. It works but only seems to do what i want it too in a few tags and not the whole page. I think this may be due to a regex error?

<?
    $English = array();
    $Simple = array();
    $con = mysqli_connect("localhost","root","root","Words");
    $getmodels = mysqli_query($con, "SELECT * FROM Wordsweb");
    while($res = mysqli_fetch_assoc($getmodels)) {
        $English[] = $res['English'];
        $Simple[] = $res['Simple'];
    }
    $url = $_GET['url'];
    $string = file_get_contents($url);
    $text_to_echo =  preg_replace_callback(
        "/(<([^.]+)>)([^<]+)(<\\/\\2>)/s", 
        function($matches) use ($English, $Simple) {
            /*
             * Indexes of array:
             *    0 - full tag
             *    1 - open tag, for example <h1>
             *    2 - tag name h1
             *    3 - content
             *    4 - closing tag
             */
            $matches[3] = strtolower($matches[3]);
            $text = str_replace($English, $Simple, $matches[3]);
            return $matches[1].$text.$matches[4];
        }, 
        $string
    );
    echo "<base href=\"" . $url . "/\" />";
    echo $text_to_echo;
    ?>
davidkonrad
  • 83,997
  • 17
  • 205
  • 265
PHP9274745382389
  • 169
  • 2
  • 14
  • this part of your regex: `(<([^.]+)>)` which is supposed to be the opening tag, is not going to give you what you expect. `[^.]+` in particular will match 1 or more of anything that is not a dot, so it's going to match for a lot more that the single tag contents. Also in general, you shouldn't use regex to parse URL. use DOM instead http://www.php.net/DOM – CrayonViolent Nov 24 '13 at 19:31
  • I would but i have no idea of how to implement dom in this! – PHP9274745382389 Nov 24 '13 at 19:33
  • alternatively you can use http://simplehtmldom.sourceforge.net/ – CrayonViolent Nov 24 '13 at 19:35
  • You should change this `/(<([^.]+)>)([^<]+)(<\\/\\2>)/s` to this `/(<\\s*([^<>]+?)\\s*>)([^<]+)(<\\/\\2\\s*>)/`, and you don't need the `//s` modifier in your regex since the dot in the class is just a literal dot. –  Nov 24 '13 at 19:36
  • I will try using http://simplehtmldom.sourceforge.net/. Would i be correct in assuming that * would mean all tags – PHP9274745382389 Nov 24 '13 at 19:40
  • http://stackoverflow.com/questions/20180554/simple-html-dom-str-replace-on-plain-text Please help on this. As far as i could get – PHP9274745382389 Nov 24 '13 at 20:31

1 Answers1

1

You can use DOM+Xpath to fetch and change the text nodes inside an HTML document:

$html = <<<'HTML'
  <html>
    <body>
      <h1>Hello World!</h1>
      <div>
        <p>Lorem Ipsum...</p>
      </div>
    </body>
  </html>
HTML;

$dom = new DOMDocument();
$dom->loadHtml($html);
$xpath = new DOMXPath($dom);

$nodes = $xpath->evaluate("//text()");
foreach ($nodes as $node) {
  $node->nodeValue = strToUpper($node->nodeValue);
}

echo $dom->saveHtml();
ThW
  • 19,120
  • 3
  • 22
  • 44