Spiga

Multilanguage database design approach

by Gabi Solomon

Preinfo

Before you go right to the comment section and recommend gettext or other similar ways, know that i am talking about content that is manageable from an admin panel or is added by the user

Also this article is based on personal experience and is not necessary the best way to do this.

Building a metalanguage website poses a lot of problems, and one of them is how you store the content in the database for each language.

If you do a google search you will find little resources about it, and most of the are on forums. This seem a bit strange to me, so after i had decided on a database schema for a metalanguage website i decided to post it here in the hope that other people might find it useful and save them some googling time.

As far as i searched there are more or less 4 databases schemas for metalanguage website.

1. Column approach

This approach is very common and basically it duplicates the column of content for each language.

table pages
– id (int)
– title_en (varchar)
– title_es (varchar)
– content_en (varchar)
– content_es (varchar)

The way you would query it is by automatically selecting the right columns according to the language chosen:
[SQL]
SELECT `id`, `title_en` as `title`, `content_en` as `content` FROM `pages`
[/SQL]
Or you could select all and do the column selection from php :
[php]
echo $rowPage['title' . $_SESSION['currentLanguage']];
[/php]

Advantages

  • It doesn’t have duplicate content, since there is only one row for each record, and only the language columns are duplicated
  • Easy to implement

Disadvantages

  • You need to build the watch what column you are working with depending on the language
  • Hard to maintain. Although this is a easy way for 2-3 languages its becomes a real drag when you have a lot of columns or a lot of languages
  • Hard to add a new language

2. Multirow approach

Another approach that i saw but i have never worked with it. It is simillar to the one above but instead of duplicating the content in columns it does it in rows.

table pages
– id (int)
– language_id (int)
– title (varchar)
– content (varchar)

So you will basically have 3 rows for the same page if you have 3 languages. The main problem i see with this approach is that it would be a bit tricky to know witch id you will use for the table relations.

Sorry but since i dont really have experience with this i cant show you sql & php examples.

Advantages

  • Ease in adding a new language

Disadvantages

  • Need to watch the table relations
  • A lot of duplicate content. You will have duplicate content for all the columns that are not translated

3. Single Translation table approach

This is an approach that becomes a little more complex then the other 2, but it is more suited for dinamic websites and which have a large number of languages or which intend to add a new language in the future and want to do it with ease.

table languages
– id (int)
– name (varchar)

table pages
– id (int)
– language_id (int)
– title (int fk)
– content (int fk)

table translation
– id (int)

table translation_entry
– translation_id (int)
– language_id (int)
– content (text)

In this approach you would store the id from the translation table in the title and content columns from the pages table, and then do a join with the translation_entry table based on the language id.

Advantages

  • Proper normalization
  • Ease in adding a new language

Disadvantages

  • Longer joins and query to get the content
  • All the translated content goes into one table
  • For me it just looks hard to work with and maintain

4. Coupled Translation table approach [my aproach :D ]

This is a variation of the above approach that to me seems easier to maintain and work with.
Instead of having just one translation table, you have one for each table. and you move the columns from the pages that need to be translated to the translation table.

table languages
– id (int)
– name (varchar)

table pages
– id (int)

table pages_translation
– id (int)
– page_id (int)
– language_id (int)
– title (text)
– content (text)

To get your data you just do a simple join:
[sql]
SELECT * FROM `pages` JOIN `pages_translation` ON `pages`.`id` = `pages_translation`.`page_id` WHERE `map_landmarks_translation`.`language_id`=’1′
[/sql]

Advantages

  • Proper normalization
  • Ease in adding a new language
  • Easy to query
  • Columns keep there names

Disadvantages

  • You have to create translation tables for all your tables that have columns that need to be translated

Conclusion

I am sure that there are other methods of doing a multilingual website, this are just the ones that i thought are most commonly used. My solution is the best, its just the best for me, because it works for my project and its easier for me to work with compared to other approaches.
In the end the best approach is the one that is the best for you. The one that you find the most easier to work with and maintain.

Cheers,
and good coding.

  • http://codeutopia.net/ Jani Hartikainen

    Good work comparing different approaches.

    I agree that #4 is probably the best. What comes to gettext, I don’t think it’s suitable for this kind of things. Content that’s longer and which can even differ between languages isn’t really something you’d want to be doing with it – in my opinion, it’s more suited to things like translating static texts on a site, like texts on buttons or links, error messages, etc.

  • http://www.gsdesign.ro/ Gabi Solomon

    @Jani Hartikainen
    Thanks for the feedback

  • Pingback: Multilanguage models in zend framework using Zend_Db_Table | GS Design

  • Gip

    I really appreciate your work!

  • http://logicpoint.pl The Willer

    To your ’2. Multirow approach – Disadvantages’, I’m not sure if I understood, but you don’t have to ‘duplicate content for all the columns that are not translated’. If you have a word which has some id, the same word in other language must have the same word id right? So in case u don’t need that word for other languages, u dont have to add blank rows. But maybe I’m wrong. :)

  • http://www.gsdesign.ro/ Gabi Solomon

    but you need to add new rows for the columns that need to be translated … so you need to have the same word on every row

  • http://ebikerzone.com.pl The Willer

    That’s true, if U got many languages this can be a problem. Anyway it’s not bad I think, at least what I know from my experience – I’m making some bikes site (http://ebikerzone.com.pl) in two languages and I chose this approach for adding articles which have to share the same id. Thanks.

  • Jones

    Hi! Great!
    You should add “Columns keep there types” in the advantages. #3 will always store the data into a text field type wich is not good.

    • solomongaby

      thats corect

  • agvozden

    I think that can be differently solved.
    And that is that you have a text with the language identifier and is retained within a single system.
    I'd only auxiliary table for a connection to the texts in other languages if we need this thing …

    • http://www.gabrielsolomon.ro/ gabi solomon

      that is an interesting solution, i chose to put all the languages in the auxiliary table so that i would have to code the decision of where to look for the strings in the main or auxiliary table.

  • http://www.Veb4Design.com/ Nima

    Hi,

    I found this article really helpful!
    I liked the fact that you compared different approaches clearly,
    I will be adapting the #4 as well,
    It suits best my needs.

    Thank you again
    Nima

  • http://twitter.com/esskar Kief

    what about

    table pages_translation
    – page_id (int)
    – language_id (int)
    – title (text)
    – content (text)

    with ( page_id, language_id ) as primary key

    • Mpmskwa

      I NEED TO IT TOO FRIEND

  • 01Kuzma

    Hello!
    Thank you for your tutorial.
    I've question:
    in your sql statement you wrote: …WHERE `map_landmarks_translation`.`language_id`='1'
    So `map_landmarks_translation` should be separate table, or it's wrong statement and it should look like:……WHERE `pages_translation`.`language_id`='1'
    Thank you!

  • CC

    If you're working with a huge amount amount of tables (think in the hundreds) then #3 is easier to maintain than #4.

  • http://redesignsoft.com/ soft

    Nice solution. but I think something with txt files and constants is more better when you want to more languages.

    • http://www.gabrielsolomon.ro/ gabi solomon

      That maybe true for some projects. But if you want the ability to edit the translations, for example for pages content then a DB aproach is a must.

  • Seth

    after using #4 model i switch to #2 witch i think is better, on #4 model i have had 166 tables and almost 1gb of data and on #2 we have 9 tables and 70mb of data and the speed is the same for the traffic we have (the website has 17 languages)

    but in the end i guess the model differs from project to project

    • Kranox

      If you have 9 tables on model #2, you would have 18 (9 * 2) tables + 1 language table on model #4. You must misunderstood model #4 :)

  • Behnoush

    hello,

    your artical was really helpful for me
    the model #4 suits best my needs

    Thanks alot
    Behnoush

  • Bernard

    I came to the same setup as yours, if only i found this artice a bit sooner it would have save me some headtroubles…the disadvantage is there !! this setup leads too a huge amount of 'translation' tables…

    so now i'm thinking if it wouldn't be much easier to create a 'translate node' table wich would store every entry in such a way that it would only require an id (or2) to select the right translation…
    All translations would be stored in 1 table, this would need an entire editing of the standard tables but it would be much less complex ones the db is edited

    I'm not sure yet how this would look like, i'm still brainstorming on this but it would solve the disadvantage of this approach and you could easily and instantly add new variables…

    • Bernard

      I wrote 'variabes' , i ment 'tables'

      fe:
      FAQS
      faq_id
      lang_id
      node_id
      you could create 100 tables this way and store everything in 1 table as benieve
      (just to illustrate, have no id if this is the right setup) but building it this way would lead to much easier sql queries i believe

      table nodes
      node_id (ai)
      lang_id
      node_id
      content

  • Qtronik

    I simply use onetable for each Lang
    Ex: lang_en table, lang_fr table (id, page_id, chunk_id)

    with a table for language (id and isoname row)
    and a table for pages (id, pages_name)

    When I call a array for all chunk of a page I call the query with the user default or set lang isoname for theright lang-table… No “join”… to slow for quering.

    and after one time the user show a page I put an cache for not quering again for recurent chunk translate like the menu !

  • Qtronik

    Really nice and clean thé finnaly information I searched for a loooong time ! Tanks ! Had you made a class for it ?

  • http://www.facebook.com/haqvoice Haq Voice

    all option are very helpful but 2 very important things are missing in advantage or disadvantage column,
    1. while entering data, how to manage each language, especially deleting and re-entering

    2. while displaying data, switching language and maintaining same page

  • Pingback: Multilanguage DB, check approach - dBforums

  • http://www.facebook.com/tiendat.vnit Dat Nguyen

    Hi, I found your article is very helpful.

    But the one problem I see that is all translations for all languages will be stored in one table. That is easy to maintain but cannot handle the large data, especially when you are using MySQL.
    For example, my table has more than 250.000 records, I have 22 languages, so the translation table will have 250.000 * 22 = 5.500.000 records. Will that cause the query slows down?

    Dat

    • http://www.gabrielsolomon.ro/ gabi solomon

      that is true of the single table aproach that is whyi dont use it

    • Kranox

      I never do something that large.
      But in MySQL5 maybe you can use forced partitioning (key, range) for large tables to get more speed on SELECT :)
      For short static sentences you can use gettext and generate .po, .mo files.
      Also you can use some form of cache (generating static HTML for frequently used pages (articles etc.), save result of most repeated SELECTs to file and so on).

  • Pingback: Internationalization with Entity Framework Code First – Part I « Luís Gonçalves

  • Shahdoost Hossein

    you can simply use Yii::t for multi language websites :D , just kidding, your article was really useful thank you

  • Bunny

    I keen #4 model….could you explain more how to reduce query statement when need to publish those content to front-page ? Thanks

  • too

    “My solution is the best, its just the best for me” – pretty biased to me :)

    • ensayofr

      how this is biased if he stated that this is the best for him? lol

  • ojciec

    map_landmarks_translation ? you not specified this table in approach, why? Can you provide full example or tutorial for your approach???

  • nope

    How can i insert page?Do you explain?

    Add Page -> OK -> Insert new ID on table -> Get ID -> GO(Redirect) -> Add Title, Content and Language Code -> OK

    Is correct above sample for insert scenario?

  • Christoforos

    The last approach is the best one on my opinion because keeps database normalized and the key for each table is doing what it is supposed to do. For example the a county called Greece is the same country whether you call it “Greece” or “Ελλάδα” so it must have the same key. The translations of the name is just the translations of the name and not the country itself, so they must have their own key, which is the key of the translation. That means that the translations go to their own table and the “idea” of the tuple to each own table. I have ended up at this approach my self, and for a long time I was wondering that maybe I am wrong because I was not seeing anyone using this approach. Lately I worked with open cart which is using this approach and now I am convinced that this is the best solution. Another advantage for this solution is that you can have full-text indexes only to the tables that need it without mixing them up with other indexes.
    No that I have state that I agree, allow me to add a question that wonders me. How to deal with a “default” language. That means that in list of rows, e.g a list with pages titles, I want to show the English title when there is an English translation and when the English translation does not exists to fall back to Greek translation.