I’m trying to process data in an Excel file saved as an XML spreadsheet. After doing a fair amount of research (I’ve not done much XML processing before) I still couldn’t make it work. Here is the content of my minimal file:
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
xmlns:sbmextension="http://www.serena.com/SBM/XSLT_Extension">
<Worksheet ss:Name="index">
</Worksheet>
</Workbook>
And my script:
use XML::LibXML;
use Data::Dumper;
my $filename = $ARGV[0];
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($filename);
my $xc = XML::LibXML::XPathContext->new( $doc->documentElement );
my $xpath = '/Workbook/Worksheet/@ss:Name';
print Dumper $xc->findvalue($xpath);
However, if I remove (the default namespace?) xmlns=”urn:schemas-microsoft-com:office:spreadsheet” then it starts working. Please can you tell me what I’m missing? I guess I could just remove it before parsing the document but I would like to understand what I’ve done wrong :). Thanks in advance.
If you want to work with XPath expressions and namespaces, you have to register the namespaces first, and then use it every time in all the XPath expressions where elements of the namespace are mentioned:
#!/usr/bin/perl
use warnings;
use strict;
use XML::LibXML;
use Data::Dumper;
my $xml = << '__XML__';
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook
xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
xmlns:sbmextension="http://www.serena.com/SBM/XSLT_Extension">
<Worksheet ss:Name="index">
</Worksheet>
</Workbook>
__XML__
my $doc = XML::LibXML->load_xml( string => $xml);
my $xc = XML::LibXML::XPathContext->new( $doc->documentElement );
$xc->registerNs('ss', 'urn:schemas-microsoft-com:office:spreadsheet');
my $xpath = '/ss:Workbook/ss:Worksheet/@ss:Name';
print Dumper $xc->findvalue($xpath);