Home » Php » regex – Replace excess whitespaces and line-breaks with PHP?

regex – Replace excess whitespaces and line-breaks with PHP?

Posted by: admin April 23, 2020 Leave a comment

Questions:
$string = "My    text       has so    much   whitespace    




Plenty of    spaces  and            tabs";

echo preg_replace("/\s\s+/", " ", $string);

I read the PHP’s documentation and follow the preg_replace’s tutorial, however this code produce

My text has so much whitespace Plenty of spaces and tabs

How can I turn it into :

My text has so much whitespace
Plenty of spaces and tabs

How to&Answers:

First, I’d like to point out that new lines can be either \r, \n, or \r\n depending on the operating system.

My solution:

echo preg_replace('/[ \t]+/', ' ', preg_replace('/[\r\n]+/', "\n", $string));

Which could be separated into 2 lines if necessary:

$string = preg_replace('/[\r\n]+/', "\n", $string);
echo preg_replace('/[ \t]+/', ' ', $string);

Update:

An even better solutions would be this one:

echo preg_replace('/[ \t]+/', ' ', preg_replace('/\s*$^\s*/m', "\n", $string));

Or:

$string = preg_replace('/\s*$^\s*/m', "\n", $string);
echo preg_replace('/[ \t]+/', ' ', $string);

I’ve changed the regular expression that makes multiple lines breaks into a single better. It uses the “m” modifier (which makes ^ and $ match the start and end of new lines) and removes any \s (space, tab, new line, line break) characters that are a the end of a string and the beginning of the next. This solve the problem of empty lines that have nothing but spaces. With my previous example, if a line was filled with spaces, it would have skipped an extra line.

Answer:

Edited the right answer. From PHP 5.2.4 or so, the following code will do:

echo preg_replace('/\v(?:[\v\h]+)/', '', $string);

Answer:

Replace Multiple Newline, Tab, Space

$text = preg_replace("/[\r\n]+/", "\n", $text);
$text = preg_replace("/\s+/", ' ', $text);

Tested 🙂

Answer:

//Newline and tab space to single space

$from_mysql = str_replace(array("\r\n", "\r", "\n", "\t"), ' ', $from_mysql);


// Multiple spaces to single space ( using regular expression)

$from_mysql = ereg_replace(" {2,}", ' ',$from_mysql);

// Replaces 2 or more spaces with a single space, {2,} indicates that you are looking for 2 or more than 2 spaces in a string.

Answer:

Alternative approach:

echo preg_replace_callback("/\s+/", function ($match) {
    $result = array();
    $prev = null;
    foreach (str_split($match[0], 1) as $char) {
        if ($prev === null || $char != $prev) {
            $result[] = $char;
        }

        $prev = $char;
    }

    return implode('', $result);
}, $string);

Output:

My text has so much whitespace
Plenty of spaces and tabs

Edit: Readded this for it being a different approach. It’s probably not what’s asked for, but it will at least not merge groups of different whitespace (e.g. space, tab, tab, space, nl, nl, space, space would become space, tab, space, nl, space).

Answer:

this would COMPLETELY MINIFY the entire string (such as a large blog article) yet preserving all HTML tags in place.

$email_body = str_replace(PHP_EOL, ' ', $email_body);
    //PHP_EOL = PHP_End_Of_Line - would remove new lines too
$email_body = preg_replace('/[\r\n]+/', "\n", $email_body);
$email_body = preg_replace('/[ \t]+/', ' ', $email_body);

Answer:

why you are doing like this?
html displays only one space even you use more than one space…

For example:

<i>test               content 1       2 3 4            5</i>

The output willl be:
test content 1 2 3 4 5

if you need more than single space in html, you have to use &nbsp;

Answer:

try with:

$string = "My    text       has so    much   whitespace    




Plenty of    spaces  and            tabs";
//Remove duplicate newlines
$string = preg_replace("/[\n]*/", "\n", $string); 
//Preserves newlines while replacing the other whitspaces with single space
echo preg_replace("/[ \t]*/", " ", $string); 

Answer:

Not sure if this will be useful nor am I absolutely positive it works like it should but it seems to be working for me.

A function that clears multiple spaces and anything else you want or don’t want and produces either a single line string or a multi-line string (dependent on passed arguments/options). Can also remove or keep characters for other languages and convert newline tabs to spaces.

/** ¯\_(ツ)_/¯ Hope it's useful to someone. **/
// If $multiLine is null this removes spaces too. <options>'[:emoji:]' with $l = true allows only known emoji.
// <options>'[:print:]' with $l = true allows all utf8 printable chars (including emoji).
// **** TODO: If a unicode emoji or language char is used in $options while $l = false; we get an odd � symbol replacement for any non-matching char. $options char seems to get through, regardless of $l = false ? (bug (?)interesting)
function alphaNumericMagic($value, $options = '', $l = false, $multiLine = false, $tabSpaces = "    ") {
    $utf8Emojis = '';
    $patterns = [];
    $replacements = [];
    if ($l && preg_match("~(\[\:emoji\:\])~", $options)) {
        $utf8Emojis = [
            '\x{1F600}-\x{1F64F}', /* Emoticons */
            '\x{1F9D0}-\x{1F9E6}',
            '\x{1F300}-\x{1F5FF}', /* Misc Characters */ // \x{1F9D0}-\x{1F9E6}
            '\x{1F680}-\x{1F6FF}', /* Transport and Map */
            '\x{1F1E0}-\x{1F1FF}' /* Flags (iOS) */
        ];
        $utf8Emojis = implode('', $utf8Emojis);
    }
    $options = str_replace("[:emoji:]", $utf8Emojis, $options);
    if (!preg_match("~(\[\:graph\:\]|\[\:print\:\]|\[\:punct\:\]|\\-)~", $options)) {
        $value = str_replace("-", ' ', $value);
    }
    if ($l) {
        $l = 'u';
        $options = $options . '\p{L}\p{N}\p{Pd}';
    } else { $l = ''; }
    if (preg_match("~(\[\:print\:\])~", $options)) {
        $patterns[] = "/[ ]+/m";
        $replacements[] = " ";
    }
    if ($multiLine) {
        $patterns[] = "/(?<!^)(?:[^\r\na-z0-9][\t]+)/m";
        $patterns[] = "/[ ]+(?![a-z0-9$options])|[^a-z0-9$options\s]/im$l";
        $patterns[] = "/\t/m";
        $patterns[] = "/(?<!^)$tabSpaces/m";
        $replacements[] = " ";
        $replacements[] = "";
        $replacements[] = $tabSpaces;
        $replacements[] = " ";
    } else if ($multiLine === null) {
        $patterns[] = "/[\r\n\t]+/m";
        $patterns[] = "/[^a-z0-9$options]/im$l";
        $replacements = "";
    } else {
        $patterns[] = "/[\r\n\t]+/m";
        $patterns[] = "/[ ]+(?![a-z0-9$options\t])|[^a-z0-9$options ]/im$l";
        $replacements[] = " ";
        $replacements[] = "";
    }
    echo "\n";
    print_r($patterns);
    echo "\n";
    echo $l;
    echo "\n";
    return preg_replace($patterns, $replacements, $value);
}

Example usage:

echo header('Content-Type: text/html; charset=utf-8', true);
$string = "fjl!sj\nfl _  sfjs-lkjf\r\n\tskj 婦女與環境健康 fsl \tklkj\thl jhj ⚧😄 lkj ⸀ skjfl gwo lsjowgtfls s";
echo "<textarea style='width:100%; height:100%;'>";
echo alphaNumericMagic($string, '⚧', true, null);
echo "\n\nAND\n\n";
echo alphaNumericMagic($string, '[:print:]', true, true);
echo "</textarea>";

Results in:

fjlsjflsfjslkjfskj婦女與環境健康fslklkjhljhj⚧lkjskjflgwolsjowgtflss

AND

fjl!sj
fl _ sfjs-lkjf
    skj 婦女與環境健康 fsl klkj hl jhj ⚧😄 lkj ⸀ skjfl gwo lsjowgtfls s

Answer:

Had the same problem when passing echoed data from PHP to Javascript (formatted as JSON). The string was peppered with useless \r\n and \t characters that are neither required nor displayed on the page.

The solution i ended up using is another way of echoing. That saves a lot of server resources compared to preg_replace (as it is suggested by other people here).


Here the before and after in comparison:

Before:

echo '
<div>

    Example
    Example

</div>
';

Output:

<div>\r\n\r\n\tExample\r\n\tExample\r\n\r\n</div>


After:

echo 
'<div>',

    'Example',
    'Example',

'</div>';

Output:

<div>ExampleExample</div>


(Yes, you can concatenate echo not only with dots, but also with comma.)