Spiga

Multilanguage database design approach

by Gabi Solomon

Preinfo

Before you go right to the comment section and recommend gettext or other similar ways, know that i am talking about content that is manageable from an admin panel or is added by the user

Also this article is based on personal experience and is not necessary the best way to do this.

Building a metalanguage website poses a lot of problems, and one of them is how you store the content in the database for each language.

If you do a google search you will find little resources about it, and most of the are on forums. This seem a bit strange to me, so after i had decided on a database schema for a metalanguage website i decided to post it here in the hope that other people might find it useful and save them some googling time.

As far as i searched there are more or less 4 databases schemas for metalanguage website.

1. Column approach

This approach is very common and basically it duplicates the column of content for each language.

table pages
-- id (int)
-- title_en (varchar)
-- title_es (varchar)
-- content_en (varchar)
-- content_es (varchar)

The way you would query it is by automatically selecting the right columns according to the language chosen:

SQL:
  1. SELECT `id`, `title_en` AS `title`, `content_en` AS `content` FROM `pages`

Or you could select all and do the column selection from php :

PHP:
  1. echo $rowPage['title' . $_SESSION['currentLanguage']];

Advantages

  • It doesn't have duplicate content, since there is only one row for each record, and only the language columns are duplicated
  • Easy to implement

Disadvantages

  • You need to build the watch what column you are working with depending on the language
  • Hard to maintain. Although this is a easy way for 2-3 languages its becomes a real drag when you have a lot of columns or a lot of languages
  • Hard to add a new language

2. Multirow approach

Another approach that i saw but i have never worked with it. It is simillar to the one above but instead of duplicating the content in columns it does it in rows.

table pages
-- id (int)
-- language_id (int)
-- title (varchar)
-- content (varchar)

So you will basically have 3 rows for the same page if you have 3 languages. The main problem i see with this approach is that it would be a bit tricky to know witch id you will use for the table relations.

Sorry but since i dont really have experience with this i cant show you sql & php examples.

Advantages

  • Ease in adding a new language

Disadvantages

  • Need to watch the table relations
  • A lot of duplicate content. You will have duplicate content for all the columns that are not translated

3. Single Translation table approach

This is an approach that becomes a little more complex then the other 2, but it is more suited for dinamic websites and which have a large number of languages or which intend to add a new language in the future and want to do it with ease.

table languages
-- id (int)
-- name (varchar)

table pages
-- id (int)
-- language_id (int)
-- title (int fk)
-- content (int fk)

table translation
-- id (int)

table translation_entry
-- translation_id (int)
-- language_id (int)
-- content (text)

In this approach you would store the id from the translation table in the title and content columns from the pages table, and then do a join with the translation_entry table based on the language id.

Advantages

  • Proper normalization
  • Ease in adding a new language

Disadvantages

  • Longer joins and query to get the content
  • All the translated content goes into one table
  • For me it just looks hard to work with and maintain

4. Coupled Translation table approach [my aproach :D ]

This is a variation of the above approach that to me seems easier to maintain and work with.
Instead of having just one translation table, you have one for each table. and you move the columns from the pages that need to be translated to the translation table.

table languages
-- id (int)
-- name (varchar)

table pages
-- id (int)

table pages_translation
-- id (int)
-- page_id (int)
-- language_id (int)
-- title (text)
-- content (text)

To get your data you just do a simple join:

SQL:
  1. SELECT * FROM `pages` JOIN `pages_translation` ON `pages`.`id` = `pages_translation`.`page_id` WHERE `map_landmarks_translation`.`language_id`='1'

Advantages

  • Proper normalization
  • Ease in adding a new language
  • Easy to query
  • Columns keep there names

Disadvantages

  • You have to create translation tables for all your tables that have columns that need to be translated

Conclusion

I am sure that there are other methods of doing a multilingual website, this are just the ones that i thought are most commonly used. My solution is the best, its just the best for me, because it works for my project and its easier for me to work with compared to other approaches.
In the end the best approach is the one that is the best for you. The one that you find the most easier to work with and maintain.

Cheers,
and good coding.

Related Posts

  • Seth
    after using #4 model i switch to #2 witch i think is better, on #4 model i have had 166 tables and almost 1gb of data and on #2 we have 9 tables and 70mb of data and the speed is the same for the traffic we have (the website has 17 languages)

    but in the end i guess the model differs from project to project
  • Nice solution. but I think something with txt files and constants is more better when you want to more languages.
  • That maybe true for some projects. But if you want the ability to edit the translations, for example for pages content then a DB aproach is a must.
  • CC
    If you're working with a huge amount amount of tables (think in the hundreds) then #3 is easier to maintain than #4.
  • 01Kuzma
    Hello!
    Thank you for your tutorial.
    I've question:
    in your sql statement you wrote: ...WHERE `map_landmarks_translation`.`language_id`='1'
    So `map_landmarks_translation` should be separate table, or it's wrong statement and it should look like:......WHERE `pages_translation`.`language_id`='1'
    Thank you!
  • what about

    table pages_translation
    -- page_id (int)
    -- language_id (int)
    -- title (text)
    -- content (text)

    with ( page_id, language_id ) as primary key
  • Hi,

    I found this article really helpful!
    I liked the fact that you compared different approaches clearly,
    I will be adapting the #4 as well,
    It suits best my needs.

    Thank you again
    Nima
  • agvozden
    I think that can be differently solved.
    And that is that you have a text with the language identifier and is retained within a single system.
    I'd only auxiliary table for a connection to the texts in other languages if we need this thing ...
  • that is an interesting solution, i chose to put all the languages in the auxiliary table so that i would have to code the decision of where to look for the strings in the main or auxiliary table.
  • Jones
    Hi! Great!
    You should add "Columns keep there types" in the advantages. #3 will always store the data into a text field type wich is not good.
  • thats corect
  • That's true, if U got many languages this can be a problem. Anyway it's not bad I think, at least what I know from my experience - I'm making some bikes site (http://ebikerzone.com.pl) in two languages and I chose this approach for adding articles which have to share the same id. Thanks.
  • but you need to add new rows for the columns that need to be translated ... so you need to have the same word on every row
  • To your '2. Multirow approach - Disadvantages', I'm not sure if I understood, but you don't have to 'duplicate content for all the columns that are not translated'. If you have a word which has some id, the same word in other language must have the same word id right? So in case u don't need that word for other languages, u dont have to add blank rows. But maybe I'm wrong. :)
  • Gip
    I really appreciate your work!
  • @Jani Hartikainen
    Thanks for the feedback
  • Good work comparing different approaches.

    I agree that #4 is probably the best. What comes to gettext, I don't think it's suitable for this kind of things. Content that's longer and which can even differ between languages isn't really something you'd want to be doing with it - in my opinion, it's more suited to things like translating static texts on a site, like texts on buttons or links, error messages, etc.
blog comments powered by Disqus