Home » Php » php – Remove whitespace from HTML

php – Remove whitespace from HTML

Posted by: admin April 23, 2020 Leave a comment

Questions:

I have HTML code like:

<div class="wrap">
    <div>
        <div id="hmenus">
            <div class="nav mainnavs">
                <ul>
                    <li><a id="nav-questions" href="/questions">Questions</a></li>
                    <li><a id="nav-tags" href="/tags">Tags</a></li>
                    <li><a id="nav-users" href="/users">Users</a></li>
                    <li><a id="nav-badges" href="/badges">Badges</a></li>
                    <li><a id="nav-unanswered" href="/unanswered">Unanswered</a></li>
                </ul>
            </div>
        </div>
    </div>
</div>

How do I remove whitespace between tags by PHP?

We should get:

<div class="wrap"><div><div id="hmenus"><div class="nav mainnavs"><ul><li><a id="nav-questions" href="/questions">Questions</a></li><li><a id="nav-tags" href="/tags">Tags</a></li><li><a id="nav-users" href="/users">Users</a></li><li><a id="nav-badges" href="/badges">Badges</a></li><li><a id="nav-unanswered" href="/unanswered">Unanswered</a></li></ul></div></div></div></div>
How to&Answers:

I can’t delete this answer but it’s no longer relevant, the web landscape has changed so much in 8 years that this has become useless.

Answer:

$html = preg_replace('~>\s+<~', '><', $html);

But I don’t see the point of this. If you’re trying to make the data size smaller, there are better options.

Answer:

It’s been a while since this question was first asked but I still see the need to post this answer in order to help people with the same problem.

None of these solutions were adoptabe for me therefore I’ve came up with this solution: Using output_buffer.

The function ob_start accepts a callback as an argument which is applied to the whole string before outputting it. Therefore if you remove whitespace from the string before flushing the output, there you’re done.

/** 
 * Remove multiple spaces from the buffer.
 * 
 * @var string $buffer
 * @return string
 */
function removeWhitespace($buffer)
{
    return preg_replace('/\s+/', ' ', $buffer);
}

ob_start('removeWhitespace');

<!DOCTYPE html>
<html>
    <head></head>
    <body></body>
</html>

ob_get_flush();

The above would print something like:

<!DOCTYPE html> <html> <head> </head> <body> </body> </html>

Hope that helps.

HOW TO USE IT IN OOP

If you’re using object-orientated code in PHP you may want to use a call-back function that is inside an object.

If you have a class called, for instance HTML, you have to use this code line

ob_start(["HTML","removeWhitespace"]); 

Answer:

$html = preg_replace('~>\s*\n\s*<~', '><', $html);

I’m thinking that this is the solution to the <b>Hello</b> <i>world</i> problem. The idea is to remove whitespace only when there’s a new line. It will work for common HTML syntax which is:

<div class="wrap">
    <div>
    </div>
</div>

Answer:

just in case someone needs this,
I coined a function from @Martin Angelova’s response and @Savas Vedova, and came up with

<?php 
   function rmspace($buffer){ 
        return preg_replace('~>\s*\n\s*<~', '><', $buffer); 
   };
?>
<?php ob_start("rmspace");  ?>
   //Content goes in here 
<?php ob_end_flush(); ?>

And it solved my problem.
Note: I didn’t test an server overhead, make sure you test before use in production

Answer:

A RegEx replace could do the trick, something like:

$result = preg_replace('!\s+!smi', ' ', $content);

Answer:

Thank you for posting this question. The problem is indeed dealing with whitespace bugs in certain environments. While the regex solution works in the general case, for a quick hack remove leading whitespace and add tags to the end of each line. PHP removes the newline following a closing ?>. E.g.:

<ul><?php ?>
<li><a id="nav-questions" href="/questions">Questions</a></li><?php ?>
<li><a id="nav-tags" href="/tags">Tags</a></li><?php ?>
<li><a id="nav-users" href="/users">Users</a></li><?php ?>
<li><a id="nav-badges" href="/badges">Badges</a></li><?php ?>
<li><a id="nav-unanswered" href="/unanswered">Unanswered</a></li><?php ?>
</ul>

Obviously this is sub-optimal for a variety of reasons, but it’ll work for a localized problem without affecting the entire tool chain.

Answer:

The array reduce function:

$html = explode("\n", $html);
function trimArray($returner, $value) {
    $returner .= trim($value);
    return $returner;
}
echo $html = array_reduce($html, 'trimArray');

Answer:

As gpupo’s post provided the cleanest solution for many different types of spacing formatting’s. However, a minor but important piece was forgotten at the end! A final string trim :-p

Below is a tested and working solution.

function compress_html($content)
{
    $i       = 0;
    $content = preg_replace('~>\s+<~', '><', $content);
    $content = preg_replace('/\s\s+/',  ' ', $content);

    while ($i < 5)
    {
        $content = str_replace('  ', ' ', $content);
        $i++;
    }

    return trim($content);
}

Answer:

//...
public function compressHtml($content)
{
    $content = preg_replace('~>\s+<~', '><', $content);
    $content = preg_replace('/\s\s+/', ' ', $content);
    $i = 0;
    while ($i < 5) {
        $content = str_replace('  ', ' ', $content);
        $i++;    
    }

    return $content;
}

Answer:

if you got 8 bit ASCII, is will remove them and keep the chars in range 128-255

 $text = preg_replace('/[\x00-\x1F\xFF]/', " ", $text );

If you have a UTF-8 encoded string is will do the work

$text = preg_replace('/[\x00-\x1F\x7F]/u', '', $text);

for more information
you have this link
more information

Answer:

Use regular expressions, like:

>(\s).*?<

Answer:

<?php
    define(COMPRESSOR, 1);

        function remove_html_comments($content = '') {
            return preg_replace('/<!--(.|\s)*?-->/', '', $content);
        }
        function sanitize_output($buffer) {
            $search = array(
                '/\>[^\S ]+/s',  // strip whitespaces after tags, except space
            '/[^\S ]+\</s',  // strip whitespaces before tags, except space
            '/(\s)+/s'       // shorten multiple whitespace sequences
          );

          $replace = array(
             '>',
             '<',
             '\1'
          );

          $buffer = preg_replace($search, $replace, $buffer);
          return remove_html_comments($buffer);
        }
        if(COMPRESSOR){ ob_start("sanitize_output"); }
    ?>

    <html>  
        <head>
          <!-- comment -->
          <title>Example   1</title>
        </head>
        <body>
           <p>This is       example</p>
        </body>
    </html>


    RESULT: <html><head><title>Example 1</title></head><body><p>This is example</p></body></html> 

Answer:

I used this regex for me and it works like a charm:

preg_replace('/[ \t]+(?!="|\')/', '', $html);

These pattern looks for space whitespace and tabulator (at least one), that is not followed by " or '. This is, to avoid removing whitespaces between html attributes.

Answer:

This works for me and it’s easy to add/remove special cases. Works with CSS, HTML and JS.

function inline_trim($t)
{
    $t = preg_replace('/>\s*\n\s*</', '><', $t); // line break between tags
    $t = preg_replace('/\n/', ' ', $t); // line break to space
    $t = preg_replace('/(.)\s+(.)/', '$1 $2', $t); // spaces between letters
    $t = preg_replace("/;\s*(.)/", ';$1', $t); // colon and letter
    $t = preg_replace("/>\s*(.)/", '>$1', $t); // tag and letter
    $t = preg_replace("/(.)\s*</", '$1<', $t); // letter and tag
    $t = preg_replace("/;\s*</", '<', $t); // colon and tag
    $t = preg_replace("/;\s*}/", '}', $t); // colon and curly brace
    $t = preg_replace("/(.)\s*}/", '$1}', $t); // letter and curly brace
    $t = preg_replace("/(.)\s*{/", '$1{', $t); // letter and curly brace
    $t = preg_replace("/{\s*{/", '{{', $t); // curly brace and curly brace
    $t = preg_replace("/}\s*}/", '}}', $t); // curly brace and curly brace
    $t = preg_replace("/{\s*([\w|.|$])/", '{$1', $t); // curly brace and letter
    $t = preg_replace("/}\s*([\w|.|$])/", '}$1', $t); // curly brace and letter
    $t = preg_replace("/\+\s+\'/", "+ '", $t); // plus and quote
    $t = preg_replace('/\+\s+\"/', '+ "', $t); // plus and double quote
    $t = preg_replace("/\'\s+\+/", "' +", $t); // quote and plus
    $t = preg_replace('/\"\s+\+/', '" +', $t); // double quote and plus

    return $t;
}