Remove all the line breaks from the html source

Question

Well I know obfuscation is a bad idea. But I want all of my html code to come in one long single line. All the html tags are generated through PHP, so I think its possible. I knew replacing \n\r from regular expression, but have no idea how to do this one. In case I am unclear here is an example

$output = '<p>
              <div class="title">Hello</div>
           </p>';
echo $output;

To be view in the source viewer as <p><div class="title">Hello</div></p>

I would like to know if this is possible. Please dont tell me this is a waste of time, bad method, pointless because I already know it is, but i really want to try this. — mrN, Mar 10 '11 at 10:46
In that case I guess you need to replace all `\n`, `\r`, `\t` and spaces — acm, Mar 10 '11 at 10:46
Are you doing this in order to obscure your html source, or for compression? — Blorgbeard, Mar 10 '11 at 10:50
What if the markup contains elements that expect their content to be `whitespace:pre`? — Gordon, Mar 10 '11 at 10:58
I guess I will ask another question then. ha ha.... @Gordon, will you suggest me a better way. to keep the whitespace within pre? — mrN, Mar 10 '11 at 12:07
@mrNepal given that `whitespace:pre` is a CSS declaration, in additon to being the default rendering mode for `
` and `` (?), I'd say there is no good solution. If you want to save on bandwidth and are not serving millions of pages a day, you'll likely save enough by gzipping on the webserver. — Gordon, Mar 10 '11 at 13:48

score 47 · Accepted Answer · edited Mar 19 '15 at 01:20

47

Maybe this?

$output = str_replace(array("\r\n", "\r"), "\n", $output);
$lines = explode("\n", $output);
$new_lines = array();

foreach ($lines as $i => $line) {
    if(!empty($line))
        $new_lines[] = trim($line);
}
echo implode($new_lines);

edited Mar 19 '15 at 01:20

JasonMArcher

14,195
22
56
52

answered Mar 10 '11 at 10:52

seriousdev

7,519
8
45
52

4

trim deletes only space at end and start, all those in the line remain – Flo Mar 10 '11 at 10:56
Thanks, this works very well too. Should I go with the `preg` solution or this one. – mrN Mar 10 '11 at 12:04
3

I'd say this one's faster and more reliable. – seriousdev Mar 10 '11 at 12:19
2

you're not removing the '/n' character. needs to be str_replace(array("\r\n", "\r", "\n"), "", $output) – RayLoveless Sep 28 '12 at 23:07
4

@RayL You misunderstood the code. The `\n` is necessary for `explode()` – mgutt Mar 19 '15 at 00:54

Svish · Answer 2 · 2014-10-28T08:11:36.930

37

You can try this perhaps.

// Before any output
ob_start();

// End of file
$output = ob_get_clean();
echo preg_replace('/^\s+|\n|\r|\s+$/m', '', $output);

This should, unless I messed up the regex, catch all output, and then replace all new line characters as well as all whitespace at the end and beginning of lines.

If you already have all output collected in a variable, you can of course just use the last line directly and skip the output buffering stuff :)

edited Oct 28 '14 at 08:11

answered Mar 10 '11 at 11:05

Svish

152,914
173
462
620

If he uses an output variable like in his code, then capturing the output is not needed – Flo Mar 10 '11 at 11:10
Wow, this is very clean solution. Thanks – mrN Mar 10 '11 at 12:02
Problem to me is this removes all spaces between enters. This maybe unintended like `\r\n` becomes `` and they render differently than normal space. Using simple rule `preg_replace('/\s+/', ' ', $str)` collapses all white spaces to single whitespace and can't cause conflict. – Ciantic Sep 24 '14 at 15:37
1

I think this will be better preg_replace('/^\s+|\n|\r|\t|\s+$/m', '', $output); to support some CJK pages – Soyoes Dec 30 '14 at 01:31

score 15 · Answer 3 · answered Sep 28 '12 at 23:06

15

Worked for me:

$output = str_replace(array("\r\n", "\r", "\n"), "", $output);

answered Sep 28 '12 at 23:06

RayLoveless

19,880
21
76
94

3

Why strip \r\n before stripping both \r and \n? Is there some some sort of speed benefit to getting all the \r\n combo style breaks first? – Jimbo Jonny Nov 21 '12 at 04:51
3

@jimbo, I'm not sure if there's speed benefit. – RayLoveless Jan 17 '13 at 23:55
( \r\n first -> https://3v4l.org/7clqm/perf#tabs ) // ( \r\n last -> https://3v4l.org/EDr2T/perf#tabs ) // – Nolwennig Nov 05 '15 at 09:41
I had to implement this for the html returned by form_input() and other form helper functions in codeigniter. This worked perfectly. Rather only replaing \n works great. – Zeeshan Feb 11 '16 at 10:41

krtek · Answer 4 · 2011-03-10T10:53:11.547

5

You can do :

$output = '<p>'.
              '<div class="title">Hello</div>'.
           '</p>';

This way, $output won't contain any line jump.

This should also work :

$output = preg_replace(array('/\r/', '/\n/'), '', $output);

edited Mar 10 '11 at 10:53

answered Mar 10 '11 at 10:46

krtek

26,334
5
56
84

i am using first way currently, but want to use something for efficient. – mrN Mar 10 '11 at 10:49
BTW, your code gives error `preg_replace() [function.preg-replace]: Delimiter must not be alphanumeric or backslash` – mrN Mar 10 '11 at 10:50
right, sorry, I fixed this... But you will have problems with spaces with the preg_replace method. – krtek Mar 10 '11 at 10:53
1

For such easy replacement use str_replace not preg – Flo Mar 10 '11 at 10:54
@Flo, str_replace gave no change. – mrN Mar 10 '11 at 10:58
there without the slashes of course like in the answer from sexyprout – Flo Mar 10 '11 at 11:07

score 3 · Answer 5 · answered Sep 26 '12 at 12:13

3

$output = preg_replace('!\s+!m', ' ', $output);

answered Sep 26 '12 at 12:13

ling

9,545
4
52
49

I would not use this as it replaces ' \n\n\n' by three whitespaces. And it could cause unwanted results if you need to insert whitespaces or tabs through javascript that is part of your html code. – mgutt Mar 18 '15 at 22:41

score 1 · Answer 6 · answered Mar 11 '11 at 05:32

This is already well answered, but you may be able to do more than just trim spaces at both ends of each line:

First extract all text within quotes (you don't want to touch those), replace with a marker with a sequence number, store the sequence number with the text
Extract all text within <script></script> tags and do the same as step #1
Replace all white-space (including \n, \r) with spaces
Replace all >1 space sequences with 1 space
Replace all >_< with >< (_ = space)
Replace all _>, <_ and </_ with >, < and </ (_ = space)
Replace markers with the actual texts

This procedure can potentially compact the entire HTML file. This takes advantage of the fact that multiple white-space text inside HTML tags are intepreted as one single space.

score 0 · Answer 7 · answered Aug 16 '13 at 06:14

0

You can't have <div> inside <p> - it is not spec-valid.

If you don't need to store it in a variable you can use this:

?><div><?php
    ?><div class="title">Hello</div><?php
?></div><?php

answered Aug 16 '13 at 06:14

happy_marmoset

2,137
3
20
25

score 0 · Answer 8 · answered May 08 '11 at 14:59

This is a (as far as I have tested) working implementation of Stephen Chung's instructions. I'm not entirely convinced by number five, but have included it anyway.

Put the things you want to protect in the protected_parts array. Do it in order that you want them protected. If the starting and ending bits are different (as they would be in HTML tags), separate them by using a comma.

Also, I've no idea if this is the most optimised way of doing this, but it works for me and seems reasonably fast. Feel free to improve, etc. (Let me know if you do too!)

function MinifyHTML($str) {
    $protected_parts = array("<pre>,</pre>", "\"", "'");
    $extracted_values = array();
    $i = 0;

    foreach ($protected_parts as $part) {
        $finished = false;
        $search_offset = 0;
        $first_offset = 0;
        $startend = explode(",", $part);
        if (count($startend) == 1) { $startend[1] = $startend[0]; }

        while (!$finished) {
            $first_offset = strpos($str, $startend[0], $search_offset);
            if ($first_offset === false) { $finished = true; }
            else {
                $search_offset = strpos($str, $startend[1], $first_offset + strlen($startend[0]));
                $extracted_values[$i] = substr($str, $first_offset + strlen($startend[0]), $search_offset - $first_offset - strlen($startend[0]));
                $str = substr($str, 0, $first_offset + strlen($startend[0]))."$#".$i."$".substr($str, $search_offset);
                $search_offset += strlen($startend[1]) + strlen((string)$i) + 3 - strlen($extracted_values[$i]);
                $i++;
            }
        }
    }

    $str = preg_replace("/\s/", " ", $str);
    $str = preg_replace("/\s{2,}/", " ", $str);
    $str = str_replace("> <", "><", $str);
    $str = str_replace(" >", ">", $str);
    $str = str_replace("< ", "<", $str);
    $str = str_replace("</ ", "</", $str);

    for ($i = count($extracted_values); $i >= 0; $i--) {
        $str = str_replace("$#".$i."$", $extracted_values[$i], $str);
    }

    return $str;
}

score 0 · Answer 9 · edited Oct 26 '12 at 07:44

This is an improved function of the above. It adds text area protection and also anything that is a tag remains untouched.

I also removed strlen in the loop (its static).

This might run faster as a one pass filter to check for any of the protected parts. For such a small protected_parts array it's going to be more efficient than looping through the $str four times.

Also this doesn't fix: class = " " (the extra spaces between = and ") as its stuff inside the tags.

function MinifyHTML($str) {
$protected_parts = array('<pre>,</pre>','<textarea>,</textarea>', '<,>');
$extracted_values = array();
$i = 0;
foreach ($protected_parts as $part) {
    $finished = false;
    $search_offset = $first_offset = 0;
    $end_offset = 1;
    $startend = explode(',', $part);
    if (count($startend) === 1) $startend[1] = $startend[0];
    $len0 = strlen($startend[0]); $len1 = strlen($startend[1]);
    while ($finished === false) {
        $first_offset = strpos($str, $startend[0], $search_offset);

        if ($first_offset === false) $finished = true;
        else {
            $search_offset = strpos($str, $startend[1], $first_offset + $len0);
            $extracted_values[$i] = substr($str, $first_offset + $len0, $search_offset - $first_offset - $len0);
            $str = substr($str, 0, $first_offset + $len0).'$$#'.$i.'$$'.substr($str, $search_offset);
            $search_offset += $len1 + strlen((string)$i) + 5 - strlen($extracted_values[$i]);
            ++$i;
        }
    }
}
$str = preg_replace("/\s/", " ", $str);
$str = preg_replace("/\s{2,}/", " ", $str);
$replace = array('> <'=>'><', ' >'=>'>','< '=>'<','</ '=>'</');
$str = str_replace(array_keys($replace), array_values($replace), $str);

for ($d = 0; $d < $i; ++$d)
    $str = str_replace('$$#'.$d.'$$', $extracted_values[$d], $str);

return $str;
}

Remove all the line breaks from the html source

9 Answers9

Linked

Related