Home » Wordpress » Browser could not read filename which contains special characters

Browser could not read filename which contains special characters

Posted by: admin November 30, 2017 Leave a comment

Questions:

I have an image whose filename is Chu Thái.jpg. When uploading it to media library, the filename in hosting has been renamed to Chu-Thái.jpg, but the path of the image doesn’t the same as the filename: http://bem.vn/httq/wp-content/uploads/sites/2/2013/10/Chu-Thái.jpg

So that, when copy the url into the brower, it says the file was not found on this server.

The requested URL /wp-head/wp-content/uploads/sites/2/2013/10/Chu-Thái-150x150.jpg was not found on this server.

I wonder how the problem caused by WordPress or by my hosting?

Answers:

The problem is that you should not upload files with special characters in it. What I use in a plugin of mine is the filter sanitize_file_name.

I ended up pulling and adapting 3 functions from this plugin, so as to do a full clean up of uploaded filenames, so as not to have this kind of error:

add_filter( 'sanitize_file_name', 't5_sanitize_filename', 10 );

/**
 * Clean up uploaded file names
 * 
 * Sanitization test done with the filename:
 * ÄäÆæÀàÁáÂâÃãÅåªₐāĆćÇçÐđÈèÉéÊêËëₑƒğĞÌìÍíÎîÏïīıÑñⁿÒòÓóÔôÕõØøₒÖöŒœßŠšşŞ™ÙùÚúÛûÜüÝýÿŽž¢€‰№$℃°C℉°F⁰¹²³⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉±×₊₌⁼⁻₋–—‑․‥…‧.png
 * @author toscho
 * @url    https://github.com/toscho/Germanix-WordPress-Plugin
 */
function t5_sanitize_filename( $filename )
{
    $filename    = html_entity_decode( $filename, ENT_QUOTES, 'utf-8' );
    $filename    = t5_translit( $filename );
    $filename    = t5_lower_ascii( $filename );
    $filename    = t5_remove_doubles( $filename );
    return $filename;
}

/**
 * Converts uppercase characters to lowercase and removes the rest.
 * https://github.com/toscho/Germanix-WordPress-Plugin
 *
 * @uses   apply_filters( 'germanix_lower_ascii_regex' )
 * @param  string $str Input string
 * @return string
 */
function t5_lower_ascii( $str )
{
    $str     = strtolower( $str );
    $regex   = array(
        'pattern'        => '~([^a-z\d_.-])~'
        , 'replacement'  => ''
    );
    // Leave underscores, otherwise the taxonomy tag cloud in the
    // backend won’t work anymore.
    return preg_replace( $regex['pattern'], $regex['replacement'], $str );
}

/**
 * Reduces repeated meta characters (-=+.) to one.
 * https://github.com/toscho/Germanix-WordPress-Plugin
 *
 * @uses   apply_filters( 'germanix_remove_doubles_regex' )
 * @param  string $str Input string
 * @return string
 */
function t5_remove_doubles( $str )
{
    $regex = apply_filters(
            'germanix_remove_doubles_regex'
            , array(
                'pattern'        => '~([=+.-])\1+~'
                , 'replacement'  => "\1"
            )
    );
    return preg_replace( $regex['pattern'], $regex['replacement'], $str );
}    

/**
 * Replaces non ASCII chars.
 * https://github.com/toscho/Germanix-WordPress-Plugin
 *
 * wp-includes/formatting.php#L531 is unfortunately completely inappropriate.
 * Modified version of Heiko Rabe’s code.
 *
 * @author Heiko Rabe http://code-styling.de
 * @link   http://www.code-styling.de/?p=574
 * @param  string $str
 * @return string
 */
function t5_translit( $str )
{
    $utf8 = array(
        'Ä'  => 'Ae'
        , 'ä'    => 'ae'
        , 'Æ'    => 'Ae'
        , 'æ'    => 'ae'
        , 'À'    => 'A'
        , 'à'    => 'a'
        , 'Á'    => 'A'
        , 'á'    => 'a'
        , 'Â'    => 'A'
        , 'â'    => 'a'
        , 'Ã'    => 'A'
        , 'ã'    => 'a'
        , 'Å'    => 'A'
        , 'å'    => 'a'
        , 'ª'    => 'a'
        , 'ₐ'    => 'a'
        , 'ā'    => 'a'
        , 'Ć'    => 'C'
        , 'ć'    => 'c'
        , 'Ç'    => 'C'
        , 'ç'    => 'c'
        , 'Ð'    => 'D'
        , 'đ'    => 'd'
        , 'È'    => 'E'
        , 'è'    => 'e'
        , 'É'    => 'E'
        , 'é'    => 'e'
        , 'Ê'    => 'E'
        , 'ê'    => 'e'
        , 'Ë'    => 'E'
        , 'ë'    => 'e'
        , 'ₑ'    => 'e'
        , 'ƒ'    => 'f'
        , 'ğ'    => 'g'
        , 'Ğ'    => 'G'
        , 'Ì'    => 'I'
        , 'ì'    => 'i'
        , 'Í'    => 'I'
        , 'í'    => 'i'
        , 'Î'    => 'I'
        , 'î'    => 'i'
        , 'Ï'    => 'Ii'
        , 'ï'    => 'ii'
        , 'ī'    => 'i'
        , 'ı'    => 'i'
        , 'I'    => 'I' // turkish, correct?
        , 'Ñ'    => 'N'
        , 'ñ'    => 'n'
        , 'ⁿ'    => 'n'
        , 'Ò'    => 'O'
        , 'ò'    => 'o'
        , 'Ó'    => 'O'
        , 'ó'    => 'o'
        , 'Ô'    => 'O'
        , 'ô'    => 'o'
        , 'Õ'    => 'O'
        , 'õ'    => 'o'
        , 'Ø'    => 'O'
        , 'ø'    => 'o'
        , 'ₒ'    => 'o'
        , 'Ö'    => 'Oe'
        , 'ö'    => 'oe'
        , 'Œ'    => 'Oe'
        , 'œ'    => 'oe'
        , 'ß'    => 'ss'
        , 'Š'    => 'S'
        , 'š'    => 's'
        , 'ş'    => 's'
        , 'Ş'    => 'S'
        , '™'    => 'TM'
        , 'Ù'    => 'U'
        , 'ù'    => 'u'
        , 'Ú'    => 'U'
        , 'ú'    => 'u'
        , 'Û'    => 'U'
        , 'û'    => 'u'
        , 'Ü'    => 'Ue'
        , 'ü'    => 'ue'
        , 'Ý'    => 'Y'
        , 'ý'    => 'y'
        , 'ÿ'    => 'y'
        , 'Ž'    => 'Z'
        , 'ž'    => 'z'
        // misc
        , '¢'    => 'Cent'
        , '€'    => 'Euro'
        , '‰'    => 'promille'
        , '№'    => 'Nr'
        , '$'    => 'Dollar'
        , '℃'    => 'Grad Celsius'
        , '°C' => 'Grad Celsius'
        , '℉'    => 'Grad Fahrenheit'
        , '°F' => 'Grad Fahrenheit'
        // Superscripts
        , '⁰'    => '0'
        , '¹'    => '1'
        , '²'    => '2'
        , '³'    => '3'
        , '⁴'    => '4'
        , '⁵'    => '5'
        , '⁶'    => '6'
        , '⁷'    => '7'
        , '⁸'    => '8'
        , '⁹'    => '9'
        // Subscripts
        , '₀'    => '0'
        , '₁'    => '1'
        , '₂'    => '2'
        , '₃'    => '3'
        , '₄'    => '4'
        , '₅'    => '5'
        , '₆'    => '6'
        , '₇'    => '7'
        , '₈'    => '8'
        , '₉'    => '9'
        // Operators, punctuation
        , '±'    => 'plusminus'
        , '×'    => 'x'
        , '₊'    => 'plus'
        , '₌'    => '='
        , '⁼'    => '='
        , '⁻'    => '-' // sup minus
        , '₋'    => '-' // sub minus
        , '–'    => '-' // ndash
        , '—'    => '-' // mdash
        , '‑'    => '-' // non breaking hyphen
        , '․'    => '.' // one dot leader
        , '‥'    => '..'  // two dot leader
        , '…'    => '...'  // ellipsis
        , '‧'    => '.' // hyphenation point
        , ' '    => '-'   // nobreak space
        , ' '    => '-'   // normal space
    );

    $str = strtr( $str, $utf8 );
    return trim( $str, '-' );
}