Home » excel » XML -> XSLT using encoding UTF-8 doesn't work with Microsoft Excel – why?

XML -> XSLT using encoding UTF-8 doesn't work with Microsoft Excel – why?

Posted by: admin May 14, 2020 Leave a comment

Questions:

i have a problem with Microsoft Excel and my generated “Textfile – csv” / “Textfile – tab”.

All application see the UTF-8 encoding and works with German umlauts (äöüßÄÖÜ).
Notepad++ (Windows 7) opens the file and shows all correct
Editor (Windows 7) opens the file and shows all correct
Only the ….. Excel opens the file (if you use it without import option dialog) with the wrong encoding and destroy all German umlauts.

I didn’t find a option in the excel preferences to avoid this problem – maybe I’m blind or maybe Microsoft doesn’t do a good job on excel.

Is there a way in XSLT to change anything, that excel will do the job correct (without the import option dialog – I know, this works, if you give them the encoding in this dialog)

Right is in the example “München” but excel gives me a wrong result. I can’t post the excel result – gives an error in the input field.

I only work in XSLT 1.0

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <table name="test">
        <row>
            <field attr3="name">München</field>
        </row>
    </table>
</root>

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="1.0">
    <xsl:output method="text" version="1.0" encoding="UTF-8" indent="no"/>
    <xsl:template match="/">
         <xsl:value-of select="root/table[@name = 'test']/row/field[@attr3 = 'name']"/>
    </xsl:template>
</xsl:stylesheet>

The result is saved as .txt in the file system.
I tried also formats like .csv and .tab – all doesn’t work with excel -> but works always in notepad++/editor/….
Only the “import dialog” in excel gives the characters in the right form – but the users want to double click the file.

How to&Answers:

Excel needs a BOM (Byte Order Mark) to correctly read UTF-8 encoded CSV. Unfortunately I don’t know how to add BOM via XSLT when using version 1.0, but you can use some external application to do it as it’s trivial task. I’ve written one myself a while back if you need a reference.

Answer:

With an XSLT 2.0 stylesheet, xsl:output has the byte-order-mark attribute.

The byte-order-mark attribute defines whether a byte order mark is written at the start of the file. If the value yes is specified, a byte order mark is written; if no is specified, no byte order mark is written. The default value depends on the encoding used. If the encoding is UTF-16, the default is yes; for UTF-8 it is implementation-defined, and for all other encodings it is no. The value of the byte order mark indicates whether high order bytes are written before or after low order bytes; the actual byte order used is implementation-dependent, unless it is defined by the selected encoding.

Change the xsl:stylesheet to version="2.0" and add byte-order-mark="yes" to the xsl:output (and obviously, use an XSLT 2.0 engine):

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0">
    <xsl:output method="text" version="1.0" encoding="UTF-8" indent="no" byte-order-mark="yes" />
    <xsl:template match="/">
        <xsl:value-of select="root/table[@name = 'test']/row/field[@attr3 = 'name']"/>
    </xsl:template>
</xsl:stylesheet>