Home » Php » php – Is "SET CHARACTER SET utf8" necessary?

php – Is "SET CHARACTER SET utf8" necessary?

Posted by: admin April 23, 2020 Leave a comment

Questions:

I´m rewritting our database class (PDO based), and got stuck at this. I´ve been taught to both use SET NAMES utf8 and SET CHARACTER SET utf8 when working with UTF-8 in PHP and MySQL.

In PDO I now want to use the PDO::MYSQL_ATTR_INIT_COMMAND parameter, but it only supports one query.

Is SET CHARACTER SET utf8 necessary?

How to&Answers:

Using SET CHARACTER SET utf8 after using SET NAMES utf8 will actually reset the character_set_connection and collation_connection to
@@character_set_database and @@collation_database respectively.

The manual states that

  • SET NAMES x is equivalent to

    SET character_set_client = x;
    SET character_set_results = x;
    SET character_set_connection = x;
    
  • and SET CHARACTER SET x is equivalent to

    SET character_set_client = x;
    SET character_set_results = x;
    SET collation_connection = @@collation_database;
    

whereas SET collation_connection = x also internally executes SET character_set_connection = <<character_set_of_collation_x>> and SET character_set_connection = x internally also executes SET collation_connection = <<default_collation_of_character_set_x.

So essentially you’re resetting character_set_connection to @@character_set_database and collation_connection to @@collation_database. The manual explains the usage of these variables:

What character set should the server
translate a statement to after
receiving it?

For this, the server uses the
character_set_connection and
collation_connection system variables.
It converts statements sent by the
client from character_set_client to
character_set_connection (except for
string literals that have an
introducer such as _latin1 or _utf8).
collation_connection is important for
comparisons of literal strings. For
comparisons of strings with column
values, collation_connection does not
matter because columns have their own
collation, which has a higher
collation precedence.

To sum this up, the encoding/transcoding procedure MySQL uses to process the query and its results is a multi-step-thing:

  1. MySQL treats the incoming query as being encoded in character_set_client.
  2. MySQL transcodes the statement from character_set_client into character_set_connection
  3. when comparing string values to column values MySQL transcodes the string value from character_set_connection into the character set of the given database column and uses the column collation to do sorting and comparison.
  4. MySQL builds up the result set encoded in character_set_results (this includes result data as well as result metadata such as column names and so on)

So it could be the case that a SET CHARACTER SET utf8 would not be sufficient to provide full UTF-8 support. Think of a default database character set of latin1 and columns defined with utf8-charset and go through the steps described above. As latin1 cannot cover all the characters that UTF-8 can cover you may lose character information in step 3.

  • Step 3: Given that your query is encoded in UTF-8 and contains characters that cannot be represented with latin1, these characters will be lost on transcoding from utf8 to latin1 (the default database character set) making your query fail.

So I think it’s safe to say that SET NAMES ... is the correct way to handle character set issues. Even though I might add that setting up your MySQL server variables correctly (all the required variables can be set statically in your my.cnf) frees you from the performance overhead of the extra query required on every connect.

Answer:

From the mysql manual:

SET CHARACTER SET is similar to SET
NAMES but sets
character_set_connection and
collation_connection to
character_set_database and
collation_database. A SET CHARACTER
SET x
statement is equivalent to these
three statements:

SET character_set_client = x;
SET character_set_results = x;
SET collation_connection = @@collation_database;

Answer:

Since needing to support international characters sets, I’ve always just set the the character set of the text type fields on database creation.

I’ve also always used UTF-8.

Within PHP set the same:

mb_internal_encoding( 'UTF-8' );