Home » Php » php – Sorting array with collation

php – Sorting array with collation

Posted by: admin July 12, 2020 Leave a comment

Questions:

I have an array with words in French : [‘États-Unis’, ‘Espagne’, etc] which I’d like to have sorted alphabetically according to its locale (fr_FR)

I’m using the following code:

$collator = new Collator('fr-FR');
echo $collator->getErrorMessage();
$collator->asort($array);

but I’m getting error U_USING_DEFAULT_WARNING where I assume English or some other locale is being used. More importantly, the array isn’t sorted correctly (US shows up before Spain, were I’d expect the opposite should happen)

I have the intl package installed, and my system has the corresponding locales (Ubuntu)

$locale -a
C
C.UTF-8
en_US.utf8
es_ES.utf8
fr_FR
fr_FR.iso88591
fr_FR.utf8
POSIX

I tried different combinations when constructing the Collation object, without any good result: “fr-FR”, “fr-FR.UTF8”, etc.

Is there anything else I’m missing?

How to&Answers:

According to this blog post, for the words cote, coté, côte and côté (already sorted in English), the sorting order in French is: cote, côte, coté and côté. The code below sorts the words in the French collation:

$words = array('cote', 'coté', 'côte',  'côté');
print_r($words);

$collator = new Collator('fr_FR');

// print info about locale
echo 'French Collation ' . (($collator->getAttribute(Collator::FRENCH_COLLATION) ==    Collator::ON) ? 'On' : 'Off') . "\n";
echo $collator->getLocale(Locale::VALID_LOCALE) . "\n";
echo $collator->getLocale(Locale::ACTUAL_LOCALE) . "\n";

$collator->asort($words);

print_r($words);

And the printed result is as follows:

Array
(
    [0] => cote
    [1] => coté
    [2] => côte
    [3] => côté
)
French Collation On
fr_FR
fr
Array
(
    [0] => cote
    [2] => côte
    [1] => coté
    [3] => côté
)

In the same blog post the author says:

[…] diacritics are evaluated from right to left rather than from left to right. Thus côte comes before coté, rather than after it as it does in languages like English that evaluate them from left to right. Because the word côte has no ACUTE on the “e” at the end of the word while coté does. In English and most other languages, the evaluation starts on the left and therefore the CIRCUMFLEX or lack thereof on the “o” is the controlling factor in ordering.

So, if you have an array with the words Spain and US, they will have the same order in English and French.

You should also keep in mind that the asort method maintain the index association of the array. See the difference:

asort:
Array
(
    [0] => cote
    [2] => côte
    [1] => coté
    [3] => côté
)

sort:
Array
(
    [0] => cote
    [1] => côte
    [2] => coté
    [3] => côté
)

About U_USING_DEFAULT_WARNING

According to this API documentation:

U_USING_DEFAULT_WARNING indicates that the default locale data was used; neither the requested locale nor any of its fall back locales could be found.

When I use the fr_FR locale, for example, I get an U_USING_FALLBACK_WARNING, which indicates that a fall back locale was used, in this case the locale fr.

Locale

As it seems, your computer does not have support to the French language (or it does, but somehow PHP can’t use it and then fallback to the default language), even though the command locale -a displays the French packages. I have some suggestions you can try.

First, list all the supported locales:

cat /usr/share/i18n/SUPPORTED 

Now, generate the languages you need:

sudo locale-gen fr_FR.UTF-8
sudo locale-gen fr_FR.ISO-8859-1
sudo dpkg-reconfigure locales

If it doesn’t work, try to install the packages language-pack-fr and language-support-fr and generate the languages again.

This problem is odd. I have an VM with Ubuntu 11.04 and PHP 5.3.8 and it works just fine, in my Debian 6 too, and I haven’t installed any package or configured anything.

Answer:

I’m using cygwin:

$ locale -a | grep fr_FR
fr_FR
fr_FR.utf8
[email protected]

(note I have no fr_FR.iso88591 in the output)

Code (file encoding is UTF-8):

$collator = new Collator('fr_FR');
var_dump($collator->getErrorMessage());

// FRENCH_COLLATION is OFF

$arr = array('États-Unis', 'Espagne');

var_dump($collator->getAttribute(Collator::FRENCH_COLLATION) == Collator::ON);
var_dump($collator->getLocale(Locale::VALID_LOCALE));
var_dump($collator->getLocale(Locale::ACTUAL_LOCALE));
$collator->asort($arr);
var_dump($arr);

// FRENCH_COLLATION is ON

$collator->setAttribute(Collator::FRENCH_COLLATION, Collator::ON);

$arr = array('États-Unis', 'Espagne');

var_dump($collator->getAttribute(Collator::FRENCH_COLLATION) == Collator::ON);
var_dump($collator->getLocale(Locale::VALID_LOCALE));
var_dump($collator->getLocale(Locale::ACTUAL_LOCALE));
$collator->asort($arr);
var_dump($arr);

Output:

string(23) "U_USING_DEFAULT_WARNING"
bool(false)
string(5) "fr_FR"
string(4) "root"
array(2) {
  [1]=>
  string(7) "Espagne"
  [0]=>
  string(11) "États-Unis"
}
bool(true)
string(5) "fr_FR"
string(4) "root"
array(2) {
  [1]=>
  string(7) "Espagne"
  [0]=>
  string(11) "États-Unis"
}

And here’s the trick: I convert file encoding to ISO 8859-1 (in vim, I do :set fileencoding=iso-8859-1) and try again:

string(23) "U_USING_DEFAULT_WARNING"
bool(false)
string(5) "fr_FR"
string(4) "root"
array(2) {
  [0]=>
  string(10) "▒tats-Unis"
  [1]=>
  string(7) "Espagne"
}
bool(true)
string(5) "fr_FR"
string(4) "root"
array(2) {
  [0]=>
  string(10) "▒tats-Unis"
  [1]=>
  string(7) "Espagne"
}

Some symbols are broken, but I think it’s because my terminal does not support given codepage. The main thing is that the order of strings now is just what you’ve described: “Espagne” comes after “États-Unis”.

So, I think it’s a file encoding.

Answer:

Try just ‘FR’, it should work for your system I guess:

$collator = new Collator('FR');