42
$string = "My    text       has so    much   whitespace    




Plenty of    spaces  and            tabs";

echo preg_replace("/\s\s+/", " ", $string);

I read the PHP's documentation and followed the preg_replace() tutorial, however this code produces:

My text has so much whitespace Plenty of spaces and tabs

How can I turn it into :

My text has so much whitespace    
Plenty of spaces and tabs
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
Teiv
  • 2,605
  • 10
  • 39
  • 48
  • Same as http://stackoverflow.com/questions/6360566/replace-multiple-newline-tab-space – Sourav Jun 18 '11 at 07:22
  • While that is true and that page is 3 days older than this page, this question is more completely asked because it provides a [mcve]. For that reason, that page should closed with this page or merged into this page. – mickmackusa Mar 28 '22 at 11:17

11 Answers11

66

First, I'd like to point out that new lines can be either \r, \n, or \r\n depending on the operating system.

My solution:

echo preg_replace('/[ \t]+/', ' ', preg_replace('/[\r\n]+/', "\n", $string));

Which could be separated into 2 lines if necessary:

$string = preg_replace('/[\r\n]+/', "\n", $string);
echo preg_replace('/[ \t]+/', ' ', $string);

Update:

An even better solutions would be this one:

echo preg_replace('/[ \t]+/', ' ', preg_replace('/\s*$^\s*/m', "\n", $string));

Or:

$string = preg_replace('/\s*$^\s*/m', "\n", $string);
echo preg_replace('/[ \t]+/', ' ', $string);

I've changed the regular expression that makes multiple lines breaks into a single better. It uses the "m" modifier (which makes ^ and $ match the start and end of new lines) and removes any \s (space, tab, new line, line break) characters that are a the end of a string and the beginning of the next. This solve the problem of empty lines that have nothing but spaces. With my previous example, if a line was filled with spaces, it would have skipped an extra line.

Francois Deschenes
  • 24,816
  • 4
  • 64
  • 61
  • You forgot to replace the tabs – Tudor Constantin Jun 18 '11 at 07:14
  • @Tudor Constantin - While his example didn't have any tabs (or at least that I can tell, I didn't include them) but I've updated my answer to include them. Thanks! – Francois Deschenes Jun 18 '11 at 07:15
  • @Francois Deschenes I removed my answer as this one is nearly complete. But let me ask, wouldn't your second replacement produce the same problem my code did? (merging spaces and tabs to a single space) – Yoshi Jun 18 '11 at 07:43
  • @Yoshi - Yes but it does it in 2 steps. First it takes care of the \r\n and then the spaces and the tabs, not all at the same time. My newest example is somewhat similar to your as it in uses \s but because I'm using the "m" modifier, ^ and $, it matches on the beginning and end of lines. – Francois Deschenes Jun 18 '11 at 07:46
  • @Francois Deschenes I meant something different. :) Wouldn't `... '/[ \t]+/', ' ' ...` merge something like ` \t\t \t ` to a single space? – Yoshi Jun 18 '11 at 07:49
  • @Francois Deschenes : I have tried your solution and it works great, however the problem of empty lines that have nothing but spaces is still unsolved. I have done a quick test over there : http://www.heypasteit.com/clip/ZJW Do you have any solution for this? – Teiv Jun 18 '11 at 11:43
  • @user433531 - The example you gave (on heypasteit.com) uses my first solution. I've since posted another one that handles lines with nothing but spaces. – Francois Deschenes Jun 18 '11 at 17:18
  • @Yoshi - Yes it would. Depending on the intended use, that may be satisfactory. For instance, if the intent is to collapse all "spaces" wether they are spaces or tabs, this solution will work well. – Francois Deschenes Jun 18 '11 at 17:26
  • Missing something here: `My text has so much whitespace Plenty of spaces and tabs and empty space after line break` So to not get the " and empty ..." on a new line it should catch `/\n /` and replace with `\n` – cottton Aug 14 '16 at 00:47
14

Edited the right answer. From PHP 5.2.4 or so, the following code will do:

echo preg_replace('/\v(?:[\v\h]+)/', '', $string);
J0e3gan
  • 8,740
  • 10
  • 53
  • 80
Harikrishnan Hr
  • 157
  • 1
  • 4
  • 1
    This will not perform as required because only whitespace characters that immediately follow a vertical whitespace character (and the 1st vertical whitespace) are matched/replaced. This answer is incorrect. Proof: https://3v4l.org/02ROK – mickmackusa Jan 03 '20 at 06:20
  • wrong answer, probably to the wrong question. It will not remove tabs – Karue Benson Karue Mar 17 '22 at 14:16
6
//Newline and tab space to single space

$from_mysql = str_replace(array("\r\n", "\r", "\n", "\t"), ' ', $from_mysql);


// Multiple spaces to single space ( using regular expression)

$from_mysql = ereg_replace(" {2,}", ' ',$from_mysql);

// Replaces 2 or more spaces with a single space, {2,} indicates that you are looking for 2 or more than 2 spaces in a string.
display-name-is-missing
  • 4,424
  • 5
  • 28
  • 41
Hoàng Vũ Tgtt
  • 1,863
  • 24
  • 8
  • The OP requires that there are two lines in the output string, but your snippet converts all newlines to single spaces. Not only does this not produce the desired result, `ereg_replace()`'s docs page says: **Warning This function was DEPRECATED in PHP 5.3.0, and REMOVED in PHP 7.0.0.** – mickmackusa Jan 03 '20 at 07:13
6

Replace Multiple Newline, Tab, Space

$text = preg_replace("/[\r\n]+/", "\n", $text);
$text = preg_replace("/\s+/", ' ', $text);

Tested :)

Community
  • 1
  • 1
Sourav
  • 17,065
  • 35
  • 101
  • 159
  • This won't work in all cases either. If there's some additional spaces before the end of the line (i.e. `"This is \r\n\r\na test"`), the second `preg_replace()` will convert \s (spaces, tabs, new lines, line breaks) into spaces thus removing the line break in between the line (the example above will become "This is a test" in other words without any line breaks). – Francois Deschenes Jun 18 '11 at 07:21
  • The second `preg_replace()` will consume all of the `\n`s produced in the first `preg_replace()`. This answer is provably incorrect. – mickmackusa Mar 28 '22 at 09:19
3

this would COMPLETELY MINIFY the entire string (such as a large blog article) yet preserving all HTML tags in place.

$email_body = str_replace(PHP_EOL, ' ', $email_body);
    //PHP_EOL = PHP_End_Of_Line - would remove new lines too
$email_body = preg_replace('/[\r\n]+/', "\n", $email_body);
$email_body = preg_replace('/[ \t]+/', ' ', $email_body);
Rakib
  • 12,376
  • 16
  • 77
  • 113
  • 1
    The find and replacement parameters of `preg_replace()` can be arrays. This can allow `preg_replace()` to be called just once in your snippet. – mickmackusa Jan 30 '22 at 22:13
2

Alternative approach:

echo preg_replace_callback("/\s+/", function ($match) {
    $result = array();
    $prev = null;
    foreach (str_split($match[0], 1) as $char) {
        if ($prev === null || $char != $prev) {
            $result[] = $char;
        }

        $prev = $char;
    }

    return implode('', $result);
}, $string);

Output:

My text has so much whitespace
Plenty of spaces and tabs

Edit: Readded this for it being a different approach. It's probably not what's asked for, but it will at least not merge groups of different whitespace (e.g. space, tab, tab, space, nl, nl, space, space would become space, tab, space, nl, space).

Yoshi
  • 54,081
  • 14
  • 89
  • 103
  • 1
    what happens if the order of whitespaces is something like "\n\n\t\t\t" - wouldn't the newlines here be replaced with tabs? – Tudor Constantin Jun 18 '11 at 07:11
  • 1
    @Tudor Constantin - I was just about to write the same thing. – Francois Deschenes Jun 18 '11 at 07:13
  • 1
    @Tudor Constantin @Francois Deschenes Mhm, you're probably right. Still to early in the morning. I'll revise my answer. – Yoshi Jun 18 '11 at 07:15
  • This is way too convoluted. It makes iterated fumction calls upon every instance of one or more whitespace character. I would not entertain this technique. p.s. the default glue for implode is an empty string, so you can safely omit that parameter. – mickmackusa Jan 03 '20 at 07:09
1

Had the same problem when passing echoed data from PHP to Javascript (formatted as JSON). The string was peppered with useless \r\n and \t characters that are neither required nor displayed on the page.

The solution i ended up using is another way of echoing. That saves a lot of server resources compared to preg_replace (as it is suggested by other people here).


Here the before and after in comparison:

Before:

echo '
<div>

    Example
    Example

</div>
';

Output:

<div>\r\n\r\n\tExample\r\n\tExample\r\n\r\n</div>


After:

echo 
'<div>',

    'Example',
    'Example',

'</div>';

Output:

<div>ExampleExample</div>


(Yes, you can concatenate echo not only with dots, but also with comma.)

0

try with:

$string = "My    text       has so    much   whitespace    




Plenty of    spaces  and            tabs";
//Remove duplicate newlines
$string = preg_replace("/[\n]*/", "\n", $string); 
//Preserves newlines while replacing the other whitspaces with single space
echo preg_replace("/[ \t]*/", " ", $string); 
Kranu
  • 2,557
  • 16
  • 22
Tudor Constantin
  • 26,330
  • 7
  • 49
  • 72
  • Does anyone check these incorrect answers before giving UVs?!? Watch this snippet make the input worse: https://3v4l.org/7UXTJ – mickmackusa Jan 03 '20 at 07:28
0

why you are doing like this?
html displays only one space even you use more than one space...

For example:

<i>test               content 1       2 3 4            5</i>

The output willl be:
test content 1 2 3 4 5

if you need more than single space in html, you have to use &nbsp;

Jagadeesan
  • 1,087
  • 3
  • 9
  • 24
  • it's not for displaying...... it's for using that string and passing it to some other APIs etc for post/pre processing – Rakib Apr 20 '13 at 10:35
0

This task requires that consecutive spaces and tabs ("horizontal whitespaces" -- \h) be replaced with a single literal space and that consecutive carriage returns and newlines ("verticle whitespaces" -- \v) be replaced with a newline. To ensure that the appropriate newline character sequences are used within your own system, use PHP_EOL.

It makes no sense to match as few as zero of an occurrence (with *) because you would potentially be adding a whitespace character where there was previously no whitespace character. For this reason, patterns for this task should only be using the + (one or more) quantifier.

If there is any chance of any kind of whitespaces occurring at the start or end of the string, then don't bother removing them with regex, just use trim().

In this context, \R would provide the same outcome as \v, but \R goes a bit farther and is more complex (perhaps unnecessarily). This is an informative read: https://www.npopov.com/2011/12/10/PCRE-and-newlines.html#meet-r

Code: (Demo)

$string = "
My    text       has so    much   whitespace    




Plenty of    spaces  and            tabs  ";
var_export(
    preg_replace(
        ['/\h+/', '/\v+/'],
        [' ',     PHP_EOL],
        trim($string)
    )
);

Output:

'My text has so much whitespace 
Plenty of spaces and tabs'
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
-1

Not sure if this will be useful nor am I absolutely positive it works like it should but it seems to be working for me.

A function that clears multiple spaces and anything else you want or don't want and produces either a single line string or a multi-line string (dependent on passed arguments/options). Can also remove or keep characters for other languages and convert newline tabs to spaces.

/** ¯\_(ツ)_/¯ Hope it's useful to someone. **/
// If $multiLine is null this removes spaces too. <options>'[:emoji:]' with $l = true allows only known emoji.
// <options>'[:print:]' with $l = true allows all utf8 printable chars (including emoji).
// **** TODO: If a unicode emoji or language char is used in $options while $l = false; we get an odd � symbol replacement for any non-matching char. $options char seems to get through, regardless of $l = false ? (bug (?)interesting)
function alphaNumericMagic($value, $options = '', $l = false, $multiLine = false, $tabSpaces = "    ") {
    $utf8Emojis = '';
    $patterns = [];
    $replacements = [];
    if ($l && preg_match("~(\[\:emoji\:\])~", $options)) {
        $utf8Emojis = [
            '\x{1F600}-\x{1F64F}', /* Emoticons */
            '\x{1F9D0}-\x{1F9E6}',
            '\x{1F300}-\x{1F5FF}', /* Misc Characters */ // \x{1F9D0}-\x{1F9E6}
            '\x{1F680}-\x{1F6FF}', /* Transport and Map */
            '\x{1F1E0}-\x{1F1FF}' /* Flags (iOS) */
        ];
        $utf8Emojis = implode('', $utf8Emojis);
    }
    $options = str_replace("[:emoji:]", $utf8Emojis, $options);
    if (!preg_match("~(\[\:graph\:\]|\[\:print\:\]|\[\:punct\:\]|\\\-)~", $options)) {
        $value = str_replace("-", ' ', $value);
    }
    if ($l) {
        $l = 'u';
        $options = $options . '\p{L}\p{N}\p{Pd}';
    } else { $l = ''; }
    if (preg_match("~(\[\:print\:\])~", $options)) {
        $patterns[] = "/[ ]+/m";
        $replacements[] = " ";
    }
    if ($multiLine) {
        $patterns[] = "/(?<!^)(?:[^\r\na-z0-9][\t]+)/m";
        $patterns[] = "/[ ]+(?![a-z0-9$options])|[^a-z0-9$options\s]/im$l";
        $patterns[] = "/\t/m";
        $patterns[] = "/(?<!^)$tabSpaces/m";
        $replacements[] = " ";
        $replacements[] = "";
        $replacements[] = $tabSpaces;
        $replacements[] = " ";
    } else if ($multiLine === null) {
        $patterns[] = "/[\r\n\t]+/m";
        $patterns[] = "/[^a-z0-9$options]/im$l";
        $replacements = "";
    } else {
        $patterns[] = "/[\r\n\t]+/m";
        $patterns[] = "/[ ]+(?![a-z0-9$options\t])|[^a-z0-9$options ]/im$l";
        $replacements[] = " ";
        $replacements[] = "";
    }
    echo "\n";
    print_r($patterns);
    echo "\n";
    echo $l;
    echo "\n";
    return preg_replace($patterns, $replacements, $value);
}

Example usage:

echo header('Content-Type: text/html; charset=utf-8', true);
$string = "fjl!sj\nfl _  sfjs-lkjf\r\n\tskj 婦女與環境健康 fsl \tklkj\thl jhj ⚧ lkj ⸀ skjfl gwo lsjowgtfls s";
echo "<textarea style='width:100%; height:100%;'>";
echo alphaNumericMagic($string, '⚧', true, null);
echo "\n\nAND\n\n";
echo alphaNumericMagic($string, '[:print:]', true, true);
echo "</textarea>";

Results in:

fjlsjflsfjslkjfskj婦女與環境健康fslklkjhljhj⚧lkjskjflgwolsjowgtflss

AND

fjl!sj
fl _ sfjs-lkjf
    skj 婦女與環境健康 fsl klkj hl jhj ⚧ lkj ⸀ skjfl gwo lsjowgtfls s
Eric Shoberg
  • 106
  • 1
  • 10
  • The m pattern modifier is a useless flag to declare if there are no starting or ending anchors in the pattern (^, $). Colons do not require escaping. `~(\[\:emoji\:\])~` can be `~\[:emoji:]~`, but the truth is that all characters in the pattern are static, so a `preg_` call is needless overhead -- just use `strpos()` to search for the existence of a string. – mickmackusa Jan 03 '20 at 07:23