Home » Php » php – Special characters encoding in image filenames after server migration

php – Special characters encoding in image filenames after server migration

Posted by: admin July 12, 2020 Leave a comment

Questions:

I’ve migrated a WordPress website from a Hostgator shared host to a Ubuntu Digital Ocean LAMP stack.

The trouble started when I exported the image files which had special characters, for example the file
operários_tarsila-1024x640.jpg.

When WordPress tries to reach the file, it displays an error. I’ve found the cause:

I can see via Inspect Element that WordPress tries to call: http://mywebsite.com/wp-content/uploads/2013/02/oper%C3%A1rios_tarsila-1024×640.jpg and the server returns a 404 error.

However if I type this URL in the browser: http://mywebsite.com/wp-content/uploads/2013/02/opera%CC%81rios_tarsila-1024×640.jpg it works and the image is displayed.

So, it seems like this difference between the á encoding from %C3%A1 (á character) to a+%CC%81 (combining accute accent) is what is causing WordPress to not display my images.

So now I have in my server thousands of accented image filenames with the structure character+ combining accent and WordPress calling the image filenames with the structure accented character.

Is there a way bash rename all of them with a comparisson table? Or a way to make Apache aware of those differences and point to the right file when this kind of confusion happen?

How to&Answers:

Apparently the problem is how the backup is decompressed on the new server.

There are 2 ways to fix this:

  1. Rename the files manually by names without accents and then modify the database and change the file names in the database (This maluco and can be dangerous, it would be best to back up the database).

  2. Upload files using Filezilla, but setting it to force the charset encoding in UTF-8.

File> Site Manager> {YOUR SITE}> Tab Charset> Force UTF-8

Answer:

Have you tried setting the same encoding in PHP script, Mysql and HTML ?

PHP : http://php.net/manual/en/function.mb-internal-encoding.php

Mysql : http://php.net/manual/en/function.mysql-set-charset.php

HTML : <meta http-equiv="content-type" content="text/html; charset=utf-8" />

This problem is looking like a charset accordance problem between all these languages.

If this is not working, you will have to use a small script to rename all your pictures, using a function like :

function wd_remove_accents($str, $charset='utf-8')
{
    $str = htmlentities($str, ENT_NOQUOTES, $charset);

    $str = preg_replace('#&([A-za-z])(?:acute|cedil|caron|circ|grave|orn|ring|slash|th|tilde|uml);#', '', $str);
    $str = preg_replace('#&([A-za-z]{2})(?:lig);#', '', $str); // pour les ligatures e.g. '&oelig;'
    $str = preg_replace('#&[^;]+;#', '', $str); // supprime les autres caractères

    return $str;
}

Source : http://www.weirdog.com/blog/php/supprimer-les-accents-des-caracteres-accentues.html

Answer:

We have just had a similar problem with french caracters in our wordpress deployment, and our solution was to upload the files with FileZilla from a PC instead of FileZilla from a Mac.

When I would upload from mac OSX to the CentOS server, the files will only show if called in the a+%CC%81 format.

When I uploaded the files from the PC, apache found the files in the %C3%A1 format, which was how wordpress had them encoded.

Answer:

We have same problem – Mac + FileZilla + special characters in SK language.

Problem fixed using another FTP client (Cyberduck in our case ).

It seems to be a problem with FileZilla filenames encofing. Force utf8 encoding (FileZilla host settings) doesn’t help.

Answer:

So, just to touch upon this issue and a solution that worked for me… I also migrated a WordPress site and found that all images with special characters in their filename produced a 404 after migration.

I ended up having to do the manual file renaming and edits to the database via phpMyAdmin. It was arduous and I definitely recommend backing up your database first.

In my case, I had a ton of media attachments that used the special character © in their filename.

First, I locally renamed the files by removing the character. I used 1-4a rename. Just found the filename and replaced it with nothing (not even a space). Then, I removed all the old files from the /wp-content/uploads/ folder and replaced them with the new files.

Next, I went into my database to update the table values. Media attachments have info stored in both the wp_posts and wp_postmeta tables. Below is the SQL I ran to update both –

update wp_posts set guid = replace(guid,'©','');

UPDATE wp_postmeta SET meta_value = REPLACE(meta_value, '©', '') 
WHERE LOWER(RIGHT(meta_value, 5)) = '.jpeg' OR 
LOWER(RIGHT(meta_value, 4)) IN ('.jpg', '.gif', '.png')

Which, again, we are replacing the character with nothing, not even a space.

I had to use the WP plugin Regenerate Thumbnails in order to have all of thumbnails + various attachment sizes update, but that did the trick.

I really appreciate everyone’s efforts on this post and this post to help me figure it out! Hope this helps someone!