40

Well I know obfuscation is a bad idea. But I want all of my html code to come in one long single line. All the html tags are generated through PHP, so I think its possible. I knew replacing \n\r from regular expression, but have no idea how to do this one. In case I am unclear here is an example

$output = '<p>
              <div class="title">Hello</div>
           </p>';
echo $output;

To be view in the source viewer as <p><div class="title">Hello</div></p>

mrN
  • 3,734
  • 15
  • 58
  • 82
  • 3
    I would like to know if this is possible. Please dont tell me this is a waste of time, bad method, pointless because I already know it is, but i really want to try this. – mrN Mar 10 '11 at 10:46
  • 4
    In that case I guess you need to replace all `\n`, `\r`, `\t` and spaces – acm Mar 10 '11 at 10:46
  • 1
    Are you doing this in order to obscure your html source, or for compression? – Blorgbeard Mar 10 '11 at 10:50
  • 3
    What if the markup contains elements that expect their content to be `whitespace:pre`? – Gordon Mar 10 '11 at 10:58
  • I guess I will ask another question then. ha ha.... @Gordon, will you suggest me a better way. to keep the whitespace within pre? – mrN Mar 10 '11 at 12:07
  • 1
    @mrNepal given that `whitespace:pre` is a CSS declaration, in additon to being the default rendering mode for `
    ` and `` (?), I'd say there is no good solution. If you want to save on bandwidth and are not serving millions of pages a day, you'll likely save enough by gzipping on the webserver.
    – Gordon Mar 10 '11 at 13:48

9 Answers9

47

Maybe this?

$output = str_replace(array("\r\n", "\r"), "\n", $output);
$lines = explode("\n", $output);
$new_lines = array();

foreach ($lines as $i => $line) {
    if(!empty($line))
        $new_lines[] = trim($line);
}
echo implode($new_lines);
JasonMArcher
  • 14,195
  • 22
  • 56
  • 52
seriousdev
  • 7,519
  • 8
  • 45
  • 52
37

You can try this perhaps.

// Before any output
ob_start();

// End of file
$output = ob_get_clean();
echo preg_replace('/^\s+|\n|\r|\s+$/m', '', $output);

This should, unless I messed up the regex, catch all output, and then replace all new line characters as well as all whitespace at the end and beginning of lines.

If you already have all output collected in a variable, you can of course just use the last line directly and skip the output buffering stuff :)

Svish
  • 152,914
  • 173
  • 462
  • 620
  • If he uses an output variable like in his code, then capturing the output is not needed – Flo Mar 10 '11 at 11:10
  • Wow, this is very clean solution. Thanks – mrN Mar 10 '11 at 12:02
  • Problem to me is this removes all spaces between enters. This maybe unintended like `\r\n` becomes `` and they render differently than normal space. Using simple rule `preg_replace('/\s+/', ' ', $str)` collapses all white spaces to single whitespace and can't cause conflict. – Ciantic Sep 24 '14 at 15:37
  • 1
    I think this will be better preg_replace('/^\s+|\n|\r|\t|\s+$/m', '', $output); to support some CJK pages – Soyoes Dec 30 '14 at 01:31
15

Worked for me:

$output = str_replace(array("\r\n", "\r", "\n"), "", $output);
RayLoveless
  • 19,880
  • 21
  • 76
  • 94
  • 3
    Why strip \r\n before stripping both \r and \n? Is there some some sort of speed benefit to getting all the \r\n combo style breaks first? – Jimbo Jonny Nov 21 '12 at 04:51
  • 3
    @jimbo, I'm not sure if there's speed benefit. – RayLoveless Jan 17 '13 at 23:55
  • ( \r\n first -> https://3v4l.org/7clqm/perf#tabs ) // ( \r\n last -> https://3v4l.org/EDr2T/perf#tabs ) // – Nolwennig Nov 05 '15 at 09:41
  • I had to implement this for the html returned by form_input() and other form helper functions in codeigniter. This worked perfectly. Rather only replaing \n works great. – Zeeshan Feb 11 '16 at 10:41
5

You can do :

$output = '<p>'.
              '<div class="title">Hello</div>'.
           '</p>';

This way, $output won't contain any line jump.

This should also work :

$output = preg_replace(array('/\r/', '/\n/'), '', $output);
krtek
  • 26,334
  • 5
  • 56
  • 84
3
$output = preg_replace('!\s+!m', ' ', $output);
ling
  • 9,545
  • 4
  • 52
  • 49
  • I would not use this as it replaces ' \n\n\n' by three whitespaces. And it could cause unwanted results if you need to insert whitespaces or tabs through javascript that is part of your html code. – mgutt Mar 18 '15 at 22:41
1

This is already well answered, but you may be able to do more than just trim spaces at both ends of each line:

  1. First extract all text within quotes (you don't want to touch those), replace with a marker with a sequence number, store the sequence number with the text
  2. Extract all text within <script></script> tags and do the same as step #1
  3. Replace all white-space (including \n, \r) with spaces
  4. Replace all >1 space sequences with 1 space
  5. Replace all >_< with >< (_ = space)
  6. Replace all _>, <_ and </_ with >, < and </ (_ = space)
  7. Replace markers with the actual texts

This procedure can potentially compact the entire HTML file. This takes advantage of the fact that multiple white-space text inside HTML tags are intepreted as one single space.

Stephen Chung
  • 14,497
  • 1
  • 35
  • 48
0

You can't have <div> inside <p> - it is not spec-valid.

If you don't need to store it in a variable you can use this:

?><div><?php
    ?><div class="title">Hello</div><?php
?></div><?php
happy_marmoset
  • 2,137
  • 3
  • 20
  • 25
0

This is a (as far as I have tested) working implementation of Stephen Chung's instructions. I'm not entirely convinced by number five, but have included it anyway.

Put the things you want to protect in the protected_parts array. Do it in order that you want them protected. If the starting and ending bits are different (as they would be in HTML tags), separate them by using a comma.

Also, I've no idea if this is the most optimised way of doing this, but it works for me and seems reasonably fast. Feel free to improve, etc. (Let me know if you do too!)

function MinifyHTML($str) {
    $protected_parts = array("<pre>,</pre>", "\"", "'");
    $extracted_values = array();
    $i = 0;

    foreach ($protected_parts as $part) {
        $finished = false;
        $search_offset = 0;
        $first_offset = 0;
        $startend = explode(",", $part);
        if (count($startend) == 1) { $startend[1] = $startend[0]; }

        while (!$finished) {
            $first_offset = strpos($str, $startend[0], $search_offset);
            if ($first_offset === false) { $finished = true; }
            else {
                $search_offset = strpos($str, $startend[1], $first_offset + strlen($startend[0]));
                $extracted_values[$i] = substr($str, $first_offset + strlen($startend[0]), $search_offset - $first_offset - strlen($startend[0]));
                $str = substr($str, 0, $first_offset + strlen($startend[0]))."$#".$i."$".substr($str, $search_offset);
                $search_offset += strlen($startend[1]) + strlen((string)$i) + 3 - strlen($extracted_values[$i]);
                $i++;
            }
        }
    }

    $str = preg_replace("/\s/", " ", $str);
    $str = preg_replace("/\s{2,}/", " ", $str);
    $str = str_replace("> <", "><", $str);
    $str = str_replace(" >", ">", $str);
    $str = str_replace("< ", "<", $str);
    $str = str_replace("</ ", "</", $str);

    for ($i = count($extracted_values); $i >= 0; $i--) {
        $str = str_replace("$#".$i."$", $extracted_values[$i], $str);
    }

    return $str;
}
James Billingham
  • 760
  • 8
  • 32
0

This is an improved function of the above. It adds text area protection and also anything that is a tag remains untouched.

I also removed strlen in the loop (its static).

This might run faster as a one pass filter to check for any of the protected parts. For such a small protected_parts array it's going to be more efficient than looping through the $str four times.

Also this doesn't fix: class = " " (the extra spaces between = and ") as its stuff inside the tags.

function MinifyHTML($str) {
$protected_parts = array('<pre>,</pre>','<textarea>,</textarea>', '<,>');
$extracted_values = array();
$i = 0;
foreach ($protected_parts as $part) {
    $finished = false;
    $search_offset = $first_offset = 0;
    $end_offset = 1;
    $startend = explode(',', $part);
    if (count($startend) === 1) $startend[1] = $startend[0];
    $len0 = strlen($startend[0]); $len1 = strlen($startend[1]);
    while ($finished === false) {
        $first_offset = strpos($str, $startend[0], $search_offset);

        if ($first_offset === false) $finished = true;
        else {
            $search_offset = strpos($str, $startend[1], $first_offset + $len0);
            $extracted_values[$i] = substr($str, $first_offset + $len0, $search_offset - $first_offset - $len0);
            $str = substr($str, 0, $first_offset + $len0).'$$#'.$i.'$$'.substr($str, $search_offset);
            $search_offset += $len1 + strlen((string)$i) + 5 - strlen($extracted_values[$i]);
            ++$i;
        }
    }
}
$str = preg_replace("/\s/", " ", $str);
$str = preg_replace("/\s{2,}/", " ", $str);
$replace = array('> <'=>'><', ' >'=>'>','< '=>'<','</ '=>'</');
$str = str_replace(array_keys($replace), array_values($replace), $str);

for ($d = 0; $d < $i; ++$d)
    $str = str_replace('$$#'.$d.'$$', $extracted_values[$d], $str);

return $str;
}
Em1
  • 1,077
  • 18
  • 38
piranxa
  • 1
  • 1