Home » excel » php – How excel reads XML file?

php – How excel reads XML file?

Posted by: admin March 9, 2020 Leave a comment

Questions:

I have researched a lot to convert an xml file to 2d array in a same way excel does trying to make same algorithm as excel does when you open an xml file in excel.

<items>
    <item>
        <sku>abc 1</sku>
        <title>a book 1</title>
        <price>42 1</price>
        <attributes>
            <attribute>
                <name>Number of pages 1</name>
                <value>123 1</value>
            </attribute>
            <attribute>
                <name>Author 1</name>
                <value>Rob dude 1</value>
            </attribute>
        </attributes>
        <contributors>
            <contributor>John 1</contributor>
            <contributor>Ryan 1</contributor>
        </contributors>
        <isbn>12345</isbn>
    </item>
    <item>
        <sku>abc 2</sku>
        <title>a book 2</title>
        <price>42 2</price>
        <attributes>
            <attribute>
                <name>Number of pages 2</name>
                <value>123 2</value>
            </attribute>
            <attribute>
                <name>Author 2</name>
                <value>Rob dude 2</value>
            </attribute>
        </attributes>
        <contributors>
            <contributor>John 2</contributor>
            <contributor>Ryan 2</contributor>
        </contributors>
        <isbn>6789</isbn>
     </item>
</items>

I want it to convert it to to 2-dimensional array like if you open the same file in Excel it will show you like this

enter image description here


I want to convert to 2-dimensional array just like Excel does. So far I can extract the labels like Excel does

function getColNames($array) {
    $cols   = array();
    foreach($array as $key=>$val) {
        if(is_array($val)) {
            if($val['type']=='complete') {
                if(in_array($val['tag'], $cols)) {

                } else {
                    $cols[] = $val['tag'];
                }
            }
         }
    }
    return $cols;
}

$p = xml_parser_create();
xml_parse_into_struct($p, $simple, $vals, $index);
xml_parser_free($p);

Goal

I want to have it generate like this..

array (
    0 => array (
        'sku'=>'abc 1',
        'title'=>'a book 1',
        'price'=>'42 1',
        'name'=>'Number of Pages 1',
        'value'=>'123 1',
        'isbn'=>12345
    ),
    1 => array (
        'sku'=>'abc 1',
        'title'=>'a book 1',
        'price'=>'42 1',
        'name'=>'Author 1',
        'value'=>'Rob dude 1',
        'isbn'=>12345
    ),
    2 => array (
        'sku'=>'abc 1',
        'title'=>'a book 1',
        'price'=>'42 1',
        'contributor'=>'John 1',
        'isbn'=>12345
    ),
    3 => array (
        'sku'=>'abc 1',
        'title'=>'a book 1',
        'price'=>'42 1',
        'contributor'=>'Ryan 1',
        'isbn'=>12345
    ),
)

Sample 2 XML..

 <items>
    <item>
       <sku>abc 1</sku>
       <title>a book 1</title>
       <price>42 1</price>
       <attributes>
          <attribute>
              <name>Number of pages 1</name>
              <value>123 1</value>
          </attribute>
          <attribute>
              <name>Author 1</name>
              <value>Rob dude 1</value>
          </attribute>
       </attributes>
       <contributors>
          <contributor>John 1</contributor>
          <contributor>Ryan 1</contributor>
       </contributors>
       <isbns>
            <isbn>12345a</isbn>
            <isbn>12345b</isbn>
       </isbns>
    </item>
    <item>
       <sku>abc 2</sku>
       <title>a book 2</title>
       <price>42 2</price>
       <attributes>
          <attribute>
              <name>Number of pages 2</name>
              <value>123 2</value>
          </attribute>
          <attribute>
              <name>Author 2</name>
              <value>Rob dude 2</value>
          </attribute>
       </attributes>
       <contributors>
          <contributor>John 2</contributor>
          <contributor>Ryan 2</contributor>
       </contributors>
       <isbns>
            <isbn>6789a</isbn>
            <isbn>6789b</isbn>
       </isbns>
    </item>
    </items>

Sample 3 XML..

<items>
<item>
   <sku>abc 1</sku>
   <title>a book 1</title>
   <price>42 1</price>
   <attributes>
      <attribute>
          <name>Number of pages 1</name>
          <value>123 1</value>
      </attribute>
      <attribute>
          <name>Author 1</name>
          <value>Rob dude 1</value>
      </attribute>
   </attributes>
   <contributors>
      <contributor>John 1</contributor>
      <contributor>Ryan 1</contributor>
   </contributors>
   <isbns>
        <isbn>
            <name>isbn 1</name>
            <value>12345a</value>
        </isbn>
        <isbn>
            <name>isbn 2</name>
            <value>12345b</value>
        </isbn>
   </isbns>
</item>
<item>
   <sku>abc 2</sku>
   <title>a book 2</title>
   <price>42 2</price>
   <attributes>
      <attribute>
          <name>Number of pages 2</name>
          <value>123 2</value>
      </attribute>
      <attribute>
          <name>Author 2</name>
          <value>Rob dude 2</value>
      </attribute>
   </attributes>
   <contributors>
      <contributor>John 2</contributor>
      <contributor>Ryan 2</contributor>
   </contributors>
   <isbns>
        <isbn>
            <name>isbn 3</name>
            <value>6789a</value>
        </isbn>
        <isbn>
            <name>isbn 4</name>
            <value>6789b</value>
        </isbn>
   </isbns>
</item>
</items>
How to&Answers:

According to your vague question, what you call “Excel” it does the following in my own words: It takes each /items/item element as a row. From that in document order, the column-name is the tag-name of each leaf-element-nodes, if there is a duplicate name, the position is of the first one.

Then it creates one row per row but only if all child-elements are leaf elements. Otherwise, the row is taken as base for the rows out of that row and non-leaf-element containing elements are interpolated. E.g. if such an entry does have two times two additional leafs with the same name, those get interpolated into two rows. Their child values are then placed into the position of the columns with the name following the logic described in the first paragraph.

How deep this logic is followed is not clear from your question. So I keep it on that level only. Otherwise the interpolation would need to recurse deeper into the tree. For that, the algorithm as outlined might not be fitting any longer.

To build that in PHP, you can particularly benefit from XPath and the interpolation works wonders as a Generator.

function tree_to_rows(SimpleXMLElement $xml)
{
    $columns = [];

    foreach ($xml->xpath('/*/*[1]//*[not(*)]') as $leaf) {
        $columns[$leaf->getName()] = null;
    }

    yield array_keys($columns);

    $name = $xml->xpath('/*/*[1]')[0]->getName();

    foreach ($xml->$name as $source) {
        $rowModel       = array_combine(array_keys($columns), array_fill(0, count($columns), null));
        $interpolations = [];

        foreach ($source as $child) {
            if ($child->count()) {
                $interpolations[] = $child;
            } else {
                $rowModel[$child->getName()] = $child;
            }
        }

        if (!$interpolations) {
            yield array_values($rowModel);
            continue;
        }

        foreach ($interpolations as $interpolation) {
            foreach ($interpolation as $interpolationStep) {
                $row = $rowModel;
                foreach ($interpolationStep->xpath('(.|.//*)[not(*)]') as $leaf) {
                    $row[$leaf->getName()] = $leaf;
                }
                yield array_values($row);
            }
        }
    }
}

Using it then can be as straight forward as:

$xml  = simplexml_load_file('items.xml');
$rows = tree_to_rows($xml);
echo new TextTable($rows);

Giving the exemplary output:

+-----+--------+-----+-----------------+----------+-----------+-----+
|sku  |title   |price|name             |value     |contributor|isbn |
+-----+--------+-----+-----------------+----------+-----------+-----+
|abc 1|a book 1|42 1 |Number of pages 1|123 1     |           |12345|
+-----+--------+-----+-----------------+----------+-----------+-----+
|abc 1|a book 1|42 1 |Author 1         |Rob dude 1|           |12345|
+-----+--------+-----+-----------------+----------+-----------+-----+
|abc 1|a book 1|42 1 |                 |          |John 1     |12345|
+-----+--------+-----+-----------------+----------+-----------+-----+
|abc 1|a book 1|42 1 |                 |          |Ryan 1     |12345|
+-----+--------+-----+-----------------+----------+-----------+-----+
|abc 2|a book 2|42 2 |Number of pages 2|123 2     |           |6789 |
+-----+--------+-----+-----------------+----------+-----------+-----+
|abc 2|a book 2|42 2 |Author 2         |Rob dude 2|           |6789 |
+-----+--------+-----+-----------------+----------+-----------+-----+
|abc 2|a book 2|42 2 |                 |          |John 2     |6789 |
+-----+--------+-----+-----------------+----------+-----------+-----+
|abc 2|a book 2|42 2 |                 |          |Ryan 2     |6789 |
+-----+--------+-----+-----------------+----------+-----------+-----+

The TextTable is a slightly modified version from https://gist.github.com/hakre/5734770 allowing to operate on Generators – in case you’re looking for that code.

Answer:

In order to get the array that you want from the xml file you have given you would have to do it this way. This was not overly fun so I hope it is indeed what you wanted.

Given the exact XML you have given about it will produce the output you have as your final result.

This was written in php 5.6 I believe you will have to move the function calls to their own line and replace [] with array() if you run into issues in your environment.

$items = simplexml_load_file("items.xml");

$items_array = [];

foreach($items as $item) {

    foreach($item->attributes->attribute as $attribute) {
        array_push($items_array, itemsFactory($item, (array) $attribute));
    }

    foreach((array) $item->contributors->contributor as $contributer) {
        array_push($items_array, itemsFactory($item, $contributer));
    }

}

function itemsFactory($item, $vars) {

    $item = (array) $item;

    return [
        "sku" => $item['sku'],
        "title" => $item['title'],
        "price" => $item['price'],
        "name" => (is_array($vars) ? $vars['name'] : ""),
        "value" => (is_array($vars) ? $vars['name'] : ""),
        "contributer" => (is_string($vars) ? $vars : ""),
        "isbn" => $item['isbn']
    ];

}

var_dump($items_array);

Here is the result when run on your XML file…

array(8) {
  [0]=>
  array(7) {
    ["sku"]=>
    string(5) "abc 1"
    ["title"]=>
    string(8) "a book 1"
    ["price"]=>
    string(4) "42 1"
    ["name"]=>
    string(17) "Number of pages 1"
    ["value"]=>
    string(17) "Number of pages 1"
    ["contributer"]=>
    string(0) ""
    ["isbn"]=>
    string(5) "12345"
  }
  [1]=>
  array(7) {
    ["sku"]=>
    string(5) "abc 1"
    ["title"]=>
    string(8) "a book 1"
    ["price"]=>
    string(4) "42 1"
    ["name"]=>
    string(8) "Author 1"
    ["value"]=>
    string(8) "Author 1"
    ["contributer"]=>
    string(0) ""
    ["isbn"]=>
    string(5) "12345"
  }
  [2]=>
  array(7) {
    ["sku"]=>
    string(5) "abc 1"
    ["title"]=>
    string(8) "a book 1"
    ["price"]=>
    string(4) "42 1"
    ["name"]=>
    string(0) ""
    ["value"]=>
    string(0) ""
    ["contributer"]=>
    string(6) "John 1"
    ["isbn"]=>
    string(5) "12345"
  }
  [3]=>
  array(7) {
    ["sku"]=>
    string(5) "abc 1"
    ["title"]=>
    string(8) "a book 1"
    ["price"]=>
    string(4) "42 1"
    ["name"]=>
    string(0) ""
    ["value"]=>
    string(0) ""
    ["contributer"]=>
    string(6) "Ryan 1"
    ["isbn"]=>
    string(5) "12345"
  }
  [4]=>
  array(7) {
    ["sku"]=>
    string(5) "abc 2"
    ["title"]=>
    string(8) "a book 2"
    ["price"]=>
    string(4) "42 2"
    ["name"]=>
    string(17) "Number of pages 2"
    ["value"]=>
    string(17) "Number of pages 2"
    ["contributer"]=>
    string(0) ""
    ["isbn"]=>
    string(4) "6789"
  }
  [5]=>
  array(7) {
    ["sku"]=>
    string(5) "abc 2"
    ["title"]=>
    string(8) "a book 2"
    ["price"]=>
    string(4) "42 2"
    ["name"]=>
    string(8) "Author 2"
    ["value"]=>
    string(8) "Author 2"
    ["contributer"]=>
    string(0) ""
    ["isbn"]=>
    string(4) "6789"
  }
  [6]=>
  array(7) {
    ["sku"]=>
    string(5) "abc 2"
    ["title"]=>
    string(8) "a book 2"
    ["price"]=>
    string(4) "42 2"
    ["name"]=>
    string(0) ""
    ["value"]=>
    string(0) ""
    ["contributer"]=>
    string(6) "John 2"
    ["isbn"]=>
    string(4) "6789"
  }
  [7]=>
  array(7) {
    ["sku"]=>
    string(5) "abc 2"
    ["title"]=>
    string(8) "a book 2"
    ["price"]=>
    string(4) "42 2"
    ["name"]=>
    string(0) ""
    ["value"]=>
    string(0) ""
    ["contributer"]=>
    string(6) "Ryan 2"
    ["isbn"]=>
    string(4) "6789"
  }
}

If you actually have access to the excel file and not the xml this could be much easier. If so we can use php excel to render the exact same thing but it would work for any dataset and not just the one specified. If that is not the case I can’t think of any other way to transform that xml file into what you want.

EDIT:

This also may bring some more light to the subject and is from the developer of PHPExcel himself PHPExcel factory error when reading XML from URL. As you can I don’t think you are able to write something that would parse any XML file that you throw at it without getting a hold of some of Excels source code or spending a very long time working on this.. time that is much beyond the scope of this question. However if you were to write something that would parse any XML file I have a feeling it would look like the above but with a TON of conditionals.

Answer:

The PHP library PHPExcel solves your issue:

https://phpexcel.codeplex.com/

You can find some samples here too:

https://phpexcel.codeplex.com/wikipage?title=Examples&referringTitle=Home

https://github.com/PHPOffice/PHPExcel/wiki/User%20Documentation

It’s the most reliable Excel library for PHP and it’s constantly maintained and upgraded.

Keep in mind that you can read (from an Excel file etc.) and write (to an Excel file, PDF etc.).