Home » Php » php – How do you custom-format the first word/character from an html-markup MySQL field?

php – How do you custom-format the first word/character from an html-markup MySQL field?

Posted by: admin February 25, 2020 Leave a comment

Questions:

I did the following which works with simple text fields:

$field = "How are you doing?";
$arr = explode(' ',trim($field));
$first_word = $arr[0];
$balance = strstr("$field"," ");

It didn’t work because the field contains html markup, perhaps an image, video, div, div, paragraph, etc and resulted in all text within the html getting mixed in with the text.

I could possibly use strip_tags to strip out the html then obtain first word and reformat it, but then I would have to figure out how to add the html back into the data. I’m wondering if there is a php or custom function ready made for this purpose.

How to&Answers:

You can use DOMDocument to parse the HTML, modify the contents, and save it back as HTML. Also, find the words is not always as simple as using space delimiters since not all languages delimit their words with spaces and not all words are necessarily delimited by spaces. For example: mother-in-law this could be viewed as one word or as 3 depending on how you define a word. Also, things like pancake do you consider this one word or two (pan and cake)? One simple solution is to use the IntlBreakIterator::createWordInstance class which implements the Unicode Standard for text segmentation A.K.A UAX #29.

Here’s an example of how you might go about implementing this:

$html = <<<'HTML'
<div>some sample text here</div>
HTML;

/* Let's extend DOMDocument to include a walk method that can traverse the entire DOM tree */
class MyDOMDocument extends DOMDocument {
    public function walk(DOMNode $node, $skipParent = false) {
        if (!$skipParent) {
            yield $node;
        }
        if ($node->hasChildNodes()) {
            foreach ($node->childNodes as $n) {
                yield from $this->walk($n);
            }
        }
    }
}

$dom = new MyDOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

// Let's traverse the DOMTree to find the first text node
foreach ($dom->walk($dom->childNodes->item(0)) as $node) {
    if ($node->nodeName === "#text") {
        break;
    }
}

// Extract the first word from that text node
$iterator = IntlBreakIterator::createWordInstance();
$iterator->setText($node->nodeValue); // set the text in the word iterator
$it = $iterator->getPartsIterator(IntlPartsIterator::KEY_RIGHT);
foreach ($it as $offset => $word) {
    break;
}

// You can do whatever you want to $word here
$word .= "s"; // I'm going to append the letter s

// Replace the text node with the modification
$unmodifiedString = substr($node->nodeValue, $offset);
$modifiedString = $word . $unmodifiedString;
$oldNode = $node; // Keep a copy of the old node for reference
$node->nodeValue = $modifiedString;

// Replace the node back into the DOM tree
$node->parentNode->replaceChild($node, $oldNode);

// Save the HTML
$newHTML = $dom->saveHTML();

echo $newHTML;

Outputs

<div>somes sample text here</div>