Home » Php » unicode – PHP detecting filesystem encoding/saving files with non-latin filenames

unicode – PHP detecting filesystem encoding/saving files with non-latin filenames

Posted by: admin July 12, 2020 Leave a comment

Questions:

I need to save files with non-latin filenames on a filesytem, using PHP.

I want to make this work cross-platform. How do I know what encoding I can use to write the file? I understand many modern filesystems are UTF-8 based (is this correct?), but I doubt Windows XP is (for instance).

So, is there a robust detection mechanism?

How to&Answers:

Not an answer to your question, but if you don’t need to do extensive operations on filesystem level (like searching, sorting…), there is a nice cross-platform workaround for the issue outlined in this SO question: URLEncode()ing file names.

Hörensägen.txt 

gets turned into

H%c3%b6rens%c3%a4gen.txt

which should be safe to use in any filesystem and is able to map any UTF-8 character.

I find this much preferable to trying to “natively” deal with the host OS’s capabilities, which is guaranteed to be complicated and error-prone (in addition to operating system differences, I’m sure the various filesystem formats – FAT16, FAT32, NTFS, extFS versions 1/2/3…. bring their own set of rules to be aware of.)

Answer:

PHP 7.1 supports UTF-8 filenames on Windows (I had a problem with serving a file with cyrillics in it’s name until I’ve updated PHP – and Apache), so if you can just update PHP, that’s the most robust and cross-platform solution these days.

I don’t even need to ini_set('mbstring.internal_encoding','UTF-8'); for file_get_contents to work properly with non-latin paths.