Schema for a multilanguage database
Here’s an example:
[ products ] id (INT) name-en_us (VARCHAR) name-es_es (VARCHAR) name-pt_br (VARCHAR) description-en_us (VARCHAR) description-es_es (VARCHAR) description-pt_br (VARCHAR) price (DECIMAL)
The problem: every new language will need modify the table structure.
Here’s another example:
[ products-en_us ] id (INT) name (VARCHAR) description (VARCHAR) price (DECIMAL) [ products-es_es ] id (INT) name (VARCHAR) description (VARCHAR) price (DECIMAL)
The problem: every new language will need the creation of new tables and the “price” field is duplicated in every table.
Here’s another example:
[ languages ] id (INT) name (VARCHAR) [ products ] id (INT) price (DECIMAL) [ translation ] id (INT, PK) model (VARCHAR) // product field (VARCHAR) // name language_id (INT, FK) text (VARCHAR)
The problem: fuc*ing hard?
Your third example is actually the way the problem is usually solved. Hard, but doable.
Remove the reference to product from the translation table and put a reference to translation where you need it (the other way around).
[ products ] id (INT) price (DECIMAL) title_translation_id (INT, FK) [ translation ] id (INT, PK) neutral_text (VARCHAR) -- other properties that may be useful (date, creator etc.) [ translation_text ] translation_id (INT, FK) language_id (INT, FK) text (VARCHAR)
As an alternative (not especially a good one) you can have one single field and keep all translations there merged together (as XML, for example).
<translation> <en>Supplier</en> <de>Lieferant</de> <fr>Fournisseur</fr> </translation>
Similar to method 3:
[languages] id (int PK) code (varchar) [products] id (int PK) neutral_fields (mixed) [products_t] id (int FK) language (int FK) translated_fields (mixed) PRIMARY KEY: id,language
So for each table, make another table (in my case with “_t” suffix) which holds the translated fields.
SELECT * FROM products, simply
... LEFT JOIN products_t ON products_t.id = products.id AND products_t.language = CURRENT_LANGUAGE.
Not that hard, and keeps you free from headaches.
In order to reduce the number of JOIN’s, you could keep separate the translated and non translated in 2 separate tables :
[ products ] id (INT) price (DECIMAL) [ products_i18n ] id (INT) name (VARCHAR) description (VARCHAR) lang_code (CHAR(5))
At my $DAYJOB we use gettext for I18N. I wrote a plugin to xgettext.pl that extracts all English text from the database tables and add them to the master messages.pot.
It works very well – translators deal with only one file when doing translation – the po file. There’s no fiddling with database entries when doing translations.
[products] id (int PK) name price all other fields of product id_language ( int FK )
I actually use this method, but in my case, it’s not in a product point of view, for the various pages in my CMS, this work’s quite well.
If you have a lot of products it might be a headache to update a single one in 5 or 6 languages… but it’s a question of working the layout.
What about fourth solution?
[ products ] id (INT) language (VARCHAR 2) name (VARCHAR) description (VARCHAR) price (DECIMAL) *translation_of (INT FK)*
*Translation_of* is FK of it self. When You add default language *translation_of* is set to Null. But when you add second language *translation_of* takes primary produkt language id.
SELECT * FROM products WHERE id = 1 AND translation_of = 1
In that case we get all translations for product with id is 1.
SELECT * FROM products WHERE id = 1 AND translation_of = 1 AND language = 'pl'
We get only product in Polish translation. Without second table and JOINS.
Have many to many relationship.
You have your data table, languages table and a data_language table.
In the data_language table you have
id, data_id, language_id
I think that might work best for your.
We use this concept for our webiste (600k views per day) and (maybe surprisingly) it works. Sure along with caching and query optimalization.
[attribute_names] id (INT) name (VARCHAR) [languages_names] id (INT) name (VARCHAR) [products] id (INT) attr_id (INT) value (MEDIUMTEXT) lang_id (INT)