Home » Php » regex – How do I remove blank lines from text in PHP?

regex – How do I remove blank lines from text in PHP?

Posted by: admin April 23, 2020 Leave a comment

Questions:

I need to remove blank lines (with whitespace or absolutely blank) in PHP. I use this regular expression, but it does not work:

$str = ereg_replace('^[ \t]*$\r?\n', '', $str);
$str = preg_replace('^[ \t]*$\r?\n', '', $str);

I want a result of:

blahblah

blahblah

   adsa 


sad asdasd

will:

blahblah

blahblah

   adsa 


sad asdasd
How to&Answers:
// New line is required to split non-blank lines
preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $string);

The above regular expression says:

/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/
    1st Capturing group (^[\r\n]*|[\r\n]+)
        1st Alternative: ^[\r\n]*
        ^ assert position at start of the string
            [\r\n]* match a single character present in the list below
                Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
                \r matches a carriage return (ASCII 13)
                \n matches a fine-feed (newline) character (ASCII 10)
        2nd Alternative: [\r\n]+
            [\r\n]+ match a single character present in the list below
            Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
            \r matches a carriage return (ASCII 13)
            \n matches a fine-feed (newline) character (ASCII 10)
    [\s\t]* match a single character present in the list below
        Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
        \s match any white space character [\r\n\t\f ]
        \tTab (ASCII 9)
    [\r\n]+ match a single character present in the list below
        Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
        \r matches a carriage return (ASCII 13)
        \n matches a fine-feed (newline) character (ASCII 10)

Answer:

Your ereg-replace() solution is wrong because the ereg/eregi methods are deprecated. Your preg_replace() won’t even compile, but if you add delimiters and set multiline mode, it will work fine:

$str = preg_replace('/^[ \t]*[\r\n]+/m', '', $str);

The m modifier allows ^ to match the beginning of a logical line rather than just the beginning of the whole string. The start-of-line anchor is necessary because without it the regex would match the newline at the end of every line, not just the blank ones. You don’t need the end-of-line anchor ($) because you’re actively matching the newline characters, but it doesn’t hurt.

The accepted answer gets the job done, but it’s more complicated than it needs to be. The regex has to match either the beginning of the string (^[\r\n]*, multiline mode not set) or at least one newline ([\r\n]+), followed by at least one newline ([\r\n]+). So, in the special case of a string that starts with one or more blank lines, they’ll be replaced with one blank line. I’m pretty sure that’s not the desired outcome.

But most of the time it replaces two or more consecutive newlines, along with any horizontal whitespace (spaces or tabs) that lies between them, with one linefeed. That’s the intent, anyway. The author seems to expect \s to match just the space character (\x20), when in fact it matches any whitespace character. That’s a very common mistake. The actual list varies from one regex flavor to the next, but at minimum you can expect \s to match whatever [ \t\f\r\n] matches.

Actually, in PHP you have a better option:

$str = preg_replace('/^\h*\v+/m', '', $str);

\h matches any horizontal whitespace character, and \v matches vertical whitespace.

Answer:

Just explode the lines of the text to an array, remove empty lines using array_filter and implode the array again.

$tmp = explode("\n", $str);
$tmp = array_filter($tmp);
$str = implode("\n", $tmp);

Or in one line:

$str = implode("\n", array_filter(explode("\n", $str)));

I don’t know, but this is maybe faster than preg_replace.

Answer:

The comment from Bythos from Jamie’s link above worked for me:

/^\n+|^[\t\s]*\n+/m

I didn’t want to strip all of the new lines, just the empty/whitespace ones. This does the trick!

Answer:

Use this:

$str = preg_replace('^\s+\r?\n$', '', $str);

Answer:

There is no need to overcomplicate things. This can be achieved with a simple short regular expression:

$text = preg_replace("/(\R){2,}/", "$1", $text);

The (\R) matches all newlines.
The {2,} matches two or more occurrences.
The $1 Uses the first backreference (platform specific EOL) as the replacement.

Answer:

Try this one:

$str = preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\r\n", $str);

If you output this to a text file, it will give the same output in the simple Notepad, WordPad and also in text editors, for example Notepad++.

Answer:

function trimblanklines($str) {
    return preg_replace('`\A[ \t]*\r?\n|\r?\n[ \t]*\Z`','',$str);
}

This one only removes them from the beginning and end, not the middle (if anyone else was looking for this).

Answer:

The accepted answer leaves an extra line-break at the end of the string. Using rtrim() will remove this final linebreak:

rtrim(preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $string));

Answer:

From this answer, the following works fine for me!

$str = "<html>
<body>";

echo str_replace(array("\r", "\n"), '', $str);

Answer:

    <?php

    function del_blanklines_in_array_q($ar){
        $strip = array();
        foreach($ar as $k => $v){
            $ll = strlen($v);
            while($ll--){
                if(ord($v[$ll]) > 32){  //hex /0x20 int 32 ascii SPACE
                    $strip[] = $v; break; 
                }
            }
        }
        return $strip;
    }

    function del_blanklines_in_file_q($in, $out){
        // in filename, out filename
        $strip = del_blanklines_in_array_q(file($in));
        file_put_contents($out, $strip );
    }

Answer:

$file = "file_name.txt";
$file_data = file_get_contents($file);
$file_data_after_remove_blank_line = preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $file_data );
file_put_contents($file,$file_data_after_remove_blank_line);

Answer:

nl2br(preg_replace(‘/^\v+/m’, ”, $r_msg))