6

I need replace spaces with   inside HTML elements. Example:

<table atrr="zxzx"><tr>
<td>adfa a   adfadfaf></td><td><br /> dfa  dfa</td>
</tr></table>

should become

<table atrr="zxzx"><tr>
<td>adfa&nbsp;a&nbsp;&nbsp;&nbsp;adfadfaf></td><td><br />&nbsp;dfa&nbsp;&nbsp;dfa</td>
</tr></table>
Gumbo
  • 643,351
  • 109
  • 780
  • 844
Pro85
  • 63
  • 1
  • 1
  • 3
  • 3
    what is the server side language ? or do you mean javascript ? – krtek Mar 06 '11 at 11:52
  • @Krtek - searching for `preg_replace` finds PHP results, so one cane assume this is a PHP question. – Oded Mar 06 '11 at 11:56
  • 2
    Have you ever thought of using CSS like [`white-space: pre`](http://www.w3.org/TR/CSS2/text.html#white-space-prop)? – Gumbo Mar 06 '11 at 12:11
  • possible duplicate of [Can you provide some examples of why it is hard to parse XML and HTML with a regex?](http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-rege) – Brad Mace Jul 09 '11 at 20:57
  • possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Paŭlo Ebermann Sep 15 '11 at 14:12

3 Answers3

8

If you're working with php, you can do

$content = str_replace(' ', '&nbsp;', $content);
krtek
  • 26,334
  • 5
  • 56
  • 84
4

use regex to catch data between tags

(?:<\/?\w+)(?:\s+\w+(?:\s*=\s*(?:\".*?\"|'.*?'|[^'\">\s]+)?)+\s*|\s*)\/?>([^<]*)?

then replace ' ' with '&nbsp;'

also to catch before and after html :

^([^<>]*)<?

>([^<>]*)$

Edit: here you go....

<?php
$data="dasdad asd a  <table atrr=\"zxzx\"><tr><td>adfa a   adfadfaf></td><td><br /> dfa  dfa</td></tr></table>  asdasd s ";
$exp="/((?:<\\/?\\w+)(?:\\s+\\w+(?:\\s*=\\s*(?:\\\".*?\\\"|'.*?'|[^'\\\">\\s]+)?)+\\s*|\\s*)\\/?>)([^<]*)?/";

$ex1="/^([^<>]*)(<?)/i";
$ex2="/(>)([^<>]*)$/i";

$data = preg_replace_callback($exp, function ($matches) {
    return $matches[1] . str_replace(" ", "&nbsp;", $matches[2]);
}, $data);
$data = preg_replace_callback($ex1, function ($matches) {
    return str_replace(" ", "&nbsp;", $matches[1]) . $matches[2];
}, $data);
$data = preg_replace_callback($ex2, function ($matches) {
    return $matches[1] . str_replace(" ", "&nbsp;", $matches[2]);
}, $data);

echo $data;
?>

it works... slightly modified but it would work without modifications (but i dont think youd understand the code ;) )

RafaelQm
  • 132
  • 1
  • 8
n00b
  • 5,642
  • 2
  • 30
  • 48
  • how is it not working for you ? -.-'''' what function do you use ? – n00b Mar 06 '11 at 12:48
  • i provided working php code. tell me if you have further problems – n00b Mar 06 '11 at 13:36
  • n00b32 i run you code. Result - changed data outside html tags. Table til content spaces. I use php 5.3.5 – Pro85 Mar 06 '11 at 13:39
  • if you remove the second and third line begginin with `'$data ='` the code does exactly what you posted in your question... i use php 5.2.6-1+lenny9 so i think its consistent with your version – n00b Mar 06 '11 at 13:45
  • It's very strange but $exp not working, $ex1 and $ex2 is ok. preg_match return 0. – Pro85 Mar 06 '11 at 13:54
  • ks****:~# php lol.php `dasdad asd a  
    adfa a   adfadfaf>
     dfa  dfa
      asdasd s `
    – n00b Mar 06 '11 at 13:59
  • if the code with no modifications doesnt work i would have to guess something is wrong with your editor/configuration/server ... – n00b Mar 06 '11 at 14:01
  • put `((?:<\/?\w+)(?:\s+\w+(?:\s*=\s*(?:\".*?\"|'.*?'|[^'\">\s]+)?)+\s*|\s*)\/?>)([^<]*)?` as regex and `
    adfa a adfadfaf>
    dfa dfa
    ` as teststring1 ... it works...
    – n00b Mar 06 '11 at 14:13
  • Yes its work. But on my php 5.3.5(freebsd and windows) still not work. Thanks n00b32. – Pro85 Mar 06 '11 at 14:22
  • adfa a adfadfaf>
    dfa dfa
    – Pro85 Mar 06 '11 at 14:27
  • After ading /x. Result: `
    adfa a adfadfaf>
     dfa dfa
    `
    – Pro85 Mar 06 '11 at 14:34
  • then im sorry i dont know why it doesnt work on your server... the problem AFAIK doesnt lay in my code, so youre on your own. the regex should work so try experimenting with it – n00b Mar 06 '11 at 14:35
3

Since tokenizing HTML with regular expressions can be quite complicated (especially when allowing SGML quirks), you should use an HTML DOM parser like the one of PHP’s DOM library. Then you can query the DOM, get all text nodes and apply your replacement function on it:

$doc = new DOMDocument();
$doc->loadHTML($str);
$body = $doc->getElementsByTagName('body')->item(0);
mapOntoTextNodes($body, function(DOMText $node) { $node->nodeValue = str_replace(' ', '&nbsp;', $node->nodeValue); });

The mapOntoTextNodes function is a custom function I had defined in How to replace text URLs and exclude URLs in HTML tags?

Community
  • 1
  • 1
Gumbo
  • 643,351
  • 109
  • 780
  • 844
  • isint this a bit of an overkill ? and if we are to overkill it we need to detect style/script etc etc tags... – n00b Mar 06 '11 at 12:07
  • I don't need to parse all html file. I need parse some parts like i posted in question. – Pro85 Mar 06 '11 at 12:36