Home » excel » php – HTML List to CSV

php – HTML List to CSV

Posted by: admin April 23, 2020 Leave a comment

Questions:

I have a multilevel list, example below:

<ul>       
    <li>Test column 01
        <ul>       
            <li>Test column 02
                <ul>       
                    <li>Test column 03
                        <ul>       
                            <li>Test column 04
                                <ul>       
                                    <li>Test column 05</li>
                                    <li>Test column 05</li>
                                    <li>Test column 05</li>
                                </ul>
                            </li>
                        </ul>
                    </li>
                </ul>
            </li>
        </ul>
    </li>
</ul>

I would like to run some php code that outputs the list as a csv file, formatted like below:

Test column 01
,Test column 02
,,Test column 03
,,,Test column 04
,,,,Test column 05
,,,,Test column 05
,,,,Test column 05

Basically, I want to be able to run an html list, (with an unlimited amount of levels), through some php code, and output a csv file that can be opened in excel, preserving the list levels in columns.

If I could find some way of adding a class to each list item, depending on its level, so first level list items get a class of level1, second level, a class of level2 etc etc, then it should be fairly straightforward to find and replace the rest.

Any ideas/help greatly appreciated.

How to&Answers:

This would work for your example HTML:

$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->loadHTML($html);

foreach ($dom->getElementsByTagName('li') as $li) {   // #1
  printf(
      '%s%s%s', 
      str_repeat(',', get_depth($li)),                // #2
      trim($li->childNodes->item(0)->nodeValue),      // #3
      PHP_EOL
  );
}

function get_depth(DOMElement $element)
{
    $depth = -1;
    while (                                           // #4
        $element->parentNode->tagName === 'li' || 
        $element->parentNode->tagName === 'ul'
    ) {
        if ($element->parentNode->tagName === 'ul') { // #5
            $depth++;
        }
        $element = $element->parentNode;
    }
    return $depth;
}

You can see the demo here.

Explanation of the marks:

  1. We fetch all the LI elements in the Markup regardless of their position. If you only want to fetch a particular UL block, use getElementsByTagName from the DOMElement holding the starting UL element. I leave it up to you to figure out how to do that.
  2. we add one comma per calculated depth. Depth is equal to the amount of UL elements above the current LI element
  3. we only fetch the first child node of the LI element, assuming it is the text node you want. If you real markup contains more than just the text node and potential UL elements, you need to adjust this to include only the text content you want. We are trimming the text result to remove the newlines it will have when there is child UL elements in the LI element.
  4. to get the depth we traverse the DOM tree up until there is no more LI or UL element.
  5. Since we want one comma per UL element above the initial LI, we only add +1 to $depth if the parentNode is a UL element