Home » Php » PHP: How to create unicode filenames

PHP: How to create unicode filenames

Posted by: admin July 12, 2020 Leave a comment

Questions:

I’m trying to create files with Unicode characters in there filenames. I don’t quite know what encoding I should use, or if it’s possible at all.

I have this file, saved in latin1 encoding:

$h = fopen("unicode_♫.txt", 'w');
fclose($h);

In UTF-8 this would decode as ‘unicode_♫.txt’. It writes it in the latin1 version to the disk (which is obvious?). I need it to be save as it would appear with UTF-8 decoding. I’ve also tried encoding it with UTF-16 but that’s not working either.

I’m using PHP 5.2, and would like this to work with NTFS, ext3 and ext4.

How can this be done?

How to&Answers:

It can’t currently be done on Windows (possibly PHP 5.4 will support this scenario). In PHP, you can only write filenames using the Windows set codepage. If the codepage, does not include the character , you cannot use it. Worse, if you have a file on Windows with such character in its filename, you’ll have trouble accessing it.

In Linux, at least with ext*, it’s a different story. You can use whatever filenames you want, the OS doesn’t care about the encoding. So if you consistently use filenames in UTF-8, you should be OK. UTF-16 is however excluded because filenames cannot include bytes with value 0.

Answer:

for me the code below works well on Win7/ntfs, Apache 2.2.21.0 & PHP 5.3.8.0:

<?php
// this source file is utf-8 encoded

$fileContent = "Content of my file which contains Turkish characters such as şığŞİĞ";

$dirName = 'Dirname with utf-8 chars such as şığŞİĞ';
$fileName = 'Filename with utf-8 chars such as şığŞİĞ';

// converting encodings of names from utf-8 to iso-8859-9 (Turkish)
$encodedDirName = iconv("UTF-8", "ISO-8859-9//TRANSLIT", $dirName);
$encodedFileName = iconv("UTF-8", "ISO-8859-9//TRANSLIT", $fileName);

mkdir($encodedDirName);
file_put_contents("$encodedDirName/$encodedFileName.txt", $fileContent);

you can do same thing for opening files:

<?php
$fileName = "Filename with utf-8 chars such as şığ";
$fileContent = file_get_contents(iconv("UTF-8", "ISO-8859-9//TRANSLIT", "$fileName.txt"));
print $fileContent;

Answer:

Using the com_dotnet PHP extension, you can access Windows’ Scripting.FileSystemObject, and then do everything you want with UTF-8 files/folders names.

I packaged this as a PHP stream wrapper, so it’s very easy to use :

https://github.com/nicolas-grekas/Patchwork-UTF8/blob/lab-windows-fs/class/Patchwork/Utf8/WinFsStreamWrapper.php

First verify that the com_dotnet extension is enabled in your php.ini
then enable the wrapper with:

stream_wrapper_register('win', 'Patchwork\Utf8\WinFsStreamWrapper');

Finally, use the functions you’re used to (mkdir, fopen, rename, etc.), but prefix your path with win://

For example:

<?php
$dir_name = "Depósito";
mkdir('win://' . $dir_name );
?>

Answer:

Filenames do not have a notion of encoding. You have to figure out the filename by other means. The only important point for your situation is that in most filesystems a filename is a null-terminated *byte*string, but in NTFS it is a null-terminated 16-bit-string. Consequently, you cannot use the standard fopen-type functions to access all possible NTFS filenames.

However, if you have obtained the NTFS filename of an existing file by other means, you can use the Windows API function GetShortPathName to obtain the short name of the file, which you can use in fopen. I don’t know if PHP lets you access Windows API functions, though, but perhaps someone has written a module or plugin for that.