Is this the correct aproach for multilingual data in a database?

Question

I'm currently developing a website where certain dynamic data must be available in various languages. The languages for now are three but the idea is to add more in the future.

Imagine this first scenario where you have a page and want to provide the content in the three different languages. The pages in my database are stored like this:

Pages UML

This works great since for each language that I have, I create a new translated content for the pages.

But, as the project progressed, I found more situations where I need translated data. The next diagram is a example of products that have a type, subtype and a status. All this data must be available in all the different languages.

Products UML

Isn't this a bit overkill since I will end up with tables holding just ID's for example? And for each table that needs to hold translated content, I need a new one.

Is there any better approach?

I'd recommend against being too generic and losing the proper enforcement of foreign keys. Please take a look at [this post](http://stackoverflow.com/a/14838175/533120) for more info. — Branko Dimitrijevic, Apr 15 '13 at 01:38
@BrankoDimitrijevic With this structure, what would be the best way to write the queries? To get all the types, I thinking about: `SELECT type_text FROM TypesTranslations JOIN (Languages, Types) ON (TypesTranslations.language_id = Languages.language_id AND TypesTranslations.type_id = Types.type_id) WHERE Languages.language_code = 'en'` The biggest problem then is to, for example, retrieve all the information of a product or get all subtypes ordered by type (because of Column 'language_id' in where clause is ambiguous). — Renato Rodrigues, Apr 20 '13 at 19:09
I don't know how you want to retrieve types, but that has nothing to do with translations. However you retrieve the type, just JOIN it with `TypeTranslation` on `type_id` and filter by the desired `TypeTranslation.language_id`. — Branko Dimitrijevic, Apr 20 '13 at 19:44
@BrankoDimitrijevic I retrieve types just like you said: `SELECT * FROM Types JOIN TypesTranslations USING (type_id) WHERE language_id = 1`. Now image, that I want to get a product which has a subtype, status and maybe another translated fields. What's the best practice? One query (faster?)? Or retrieve the product (one query) and then retrieve its translated subtype (+1 query), status (+1), etc... When I try to do it with one query, `WHERE language_id = 1' simply doesn't work because of the use of ambiguous column name. — Renato Rodrigues, Apr 20 '13 at 20:09
One query will certainly be faster, if for no other reason than because you are saving an extra database round-trip. I don't understand why you are having problems with ambiguous names. Can't you just qualify them appropriately? — Branko Dimitrijevic, Apr 20 '13 at 21:30

score 2 · Accepted Answer · answered Apr 14 '13 at 18:23

2

If you're willing to sacrifice some foreign keys and "database purity" you could do this:

CREATE TABLE LocalizedTexts(OwnerType byte PK, OwnerID int PK, LanguageID int PK, Text string)

For each "owner" (Products, StatusNames, Types, ...) you assign an integer constant OwnerType. In LocalizedTexts you have a primary key consisting of the owner type, owner id (for example the ProductID or StatusNameID or TypeID, ...) and language id.

OwnerType disambiguates in case OwnerID is not unique (because it comes from multiple tables).

That is a nicely extensible and normalized schema. The only disadvantage is that you can't have a FK here.

answered Apr 14 '13 at 18:23

usr

168,620
35
240
369

would you care to explain the last sentence "The only disadvantage is that you can't have a FK here."? – Renato Rodrigues Apr 14 '13 at 19:23
2

You cannot link `LocalizedTexts` and `Products` for example because `OwnerID` would contain IDs from other tables as well. This is a minor disadvantage in my opinion. – usr Apr 14 '13 at 19:37
1

We use this approach for other things like comments, tags, etc in our database. I think it makes stuff pretty simple. – Remy Apr 15 '13 at 06:31
@Remy what do you mean? Why use this approach with Comments and the other one you posted with Products for instance? – Renato Rodrigues Apr 15 '13 at 08:49
1

Tried to explain it again in my post. – Remy Apr 15 '13 at 12:02

Remy · Answer 2 · 2013-04-15T12:01:48.960

I think it depends if you need a "dynamic" approach to new languages, or if you basically know how many languages your app needs to support.

We just have a column for every language:

Create table Product (int id, text name_de, text name_en, etc... )

But this means, that if you add a new language, you have to change your DB schema a bit. But makes programming simple and lookups fast.

The difference between this solution and @usr's one is that in his solution, you can add languages without changing the db. My solution here needs an schema update, everytime you add a new language. E.g. most companies in Switzerland have their website in 3, maybe 4 languages and they stay that way. So, this solution works. But if you constantly add languages, then this is a bit complicated.

Is this the correct aproach for multilingual data in a database?

2 Answers2