How Apps Are Localized

William Kwok
The Startup
Published in
9 min readJan 17, 2021

--

Thumbnail image of me

I also have a youtube version of this article! Please check it out if you’re a more audible and visual learner.

YouTube link: https://youtu.be/ra3UbgwnSbo

Open your phone, and look at all the applications you use. Most of them are probably available in other languages. It’s not something we usually think about, but it’s super important!

Internationalization of your application is important if you want to make your product accessible to more people. This doesn’t just mean people in other countries, this also is important for people whose native language is not English.

If you’re an engineer tasked with adding translations to your application, or you’re someone interested in learning how localization of a product works, stick around. In the next few minutes I’m going to teach you a few things that I know about localization, how you might get started, and some nuances to look out for.

For a lot of these concepts, I’ll use examples with Javascript libraries, but what I’m going to teach can really be done in any language. I just know Javascript best, so that’s what I’ll use to teach.

How to get translations

At its basics, your service or application will have a key string to work with and a locale to work with to produce a completely translated string. You might also have number formatting and variables and stuff, but I’ll touch on that later.

"en-US", { "translation.key": "Translation Key" }

This key is something you define. So imagine I wanted to have an application with Colors, primary, secondary, tertiary, etc. This is probably how I would define the colors in a key value form.

Edit: /u/Sevii on reddit made a great point that often times you won’t just be translating single words, but you’ll be translating entire phrases. You may plug single translated words into entire translated phrases. Just keep that in mind that that is possible as well!

{
"colors.red": "Red",
"colors.blue": "Blue",
"colors.green": "Green"
}

The locale will come from a user setting somewhere that maybe you have saved based on the user, or maybe it comes from their browser or device. As far as I have seen, what is most commonly used for this locale is the IETF BCP 47 standard. The result will be produced with some service that takes in all the translations from somewhere, and spits out the result.

A visual depiction of a “translation service”
A visual depiction of a “translation service”

So where do these translations come from?

Well, you can either find a friend who knows the language well enough to translate it, or you can pay people to translate it for you. I don’t know which is the best service, and I also don’t know what different companies use, but I’m sure you can find your own solution that fits within your budget.

Storage

Honestly, there’s so many ways to go about this. For example, if you have a large team, and you need to add a bunch of strings to get translated, what’s the best way for your engineers to do that? Maybe you want to make your own service where engineers can type in translation keys and the english version of the string and then you send that off to get translated.

const enLocales = {
"colors.red": "Red",
"colors.blue": "Blue",
"colors.green": "Green”
};
const esLocales = {
"colors.red": "Rojo",
"colors.blue": "Azul",
"colors.green": "Verde”
};
const locales = new Map([
"en": enLocales,
"es": esLocales
]);

Maybe you want to keep everything in a repo. I personally don’t know what companies actually do, but I’m sure you can think of a solution that you can expand upon in the future.

Retrieval

So now it’s time to use the strings. Now if your application doesn’t have many strings to translate, you can probably just grab them all and bundle them in the front end, or keep them in memory on the backend. But, if there are a ton of strings (you’d be surprised at how much memory these strings can take up!), then you’ll want to think about how to split them up, either with code splitting different locales using webpack, or by using dynamic imports or caching on your backend. I won’t be going into how to do those, so that’s your homework if you don’t know how to do it.

One of the best Javascript libraries I know of that will work with both back end and front end is i18next. I’m sure similar solutions exist for other languages, so I’ll just give you a few examples of the usage. It’s honestly a simple library, so you can probably also just implement your own version of the library.

import i18next from "i18next";i18next.init({
fallbackLng: "en-US",
lng: "en-US”
});
i18next.addResourceBundle(
"en-US",
"AppNameHere",
{
"colors.red": "Red",
"colors.blue": "Blue",
"colors.green": "Green”
}
);
i18next.translate("AppNameHere:colors.red") // "Red"

It takes in your language code in the IETF BCP47 standard that I previously mentioned, and it also takes in all the translations in a json format. Then you just use it in your code! It’s super simple, but there’s some tricks to it that’ll make using it more effective, that you may not have thought about.

Trick #1: When using this, don’t rely on strings for the BCP47 standard, create an enum or a map for it, so that the variables are set in stone here, and have no room for changes. Sometimes we may unintentionally capitalize parts of the language code, which causes issues because the codes are case sensitive.

enum LanguageCodes {
EN_US = 'en-US’,
ES_419 = 'es-419’,
// etc...
}
i18next.addResourceBundle(
LanguageCodes.EN_US,
// ...
);

Trick #2: Namespacing can be important depending on the scale of your application. If you had an application that showed colors to your user and maybe you wanted to have a whole segment of your application dedicated to primary colors versus secondary colors, you might choose to segment those under separate namespaces.

This is good for if there are multiple different teams working with different translations, so they don’t step on each other’s toes or make bundle sizes too large because as your application grows, the amount of strings will grow and become unwieldy for one service to hold all the localizations depending on if you have a monolith versus a microservice architecture.

i18next.addResourceBundle(
"en-US",
"PrimaryColors",
{
"red": "Red",
"blue": "Blue”
}
);
i18next.addResourceBundle(
"en-US",
"SecondaryColors",
{
"green": "Green”
}
);
i18next.translate("PrimaryColors:red") // "Red”
i18next.translate("SecondaryColors:green") // "Green"

Trick #3: Plural forms. i18next supports plural forms of strings. I actually don’t have experience with using it in depth, so I can’t say much about it, but it definitely is something you should think about, depending on your use case. In addition to this, i18next supports interpolation, which is sort of like plugging in variables into the strings that are already translated.

Trick #4: Psuedoloc. Something to be aware of is that strings in different languages might be longer. Take a look at this example that Netflix engineering had for a string in German. Look how much longer it is! What you want to do is be able to test this for every single case you can think of. Switching languages might work for some, but the best way is just to make a fake language that you can ensure will test the string length limits. Psuedoloc has been invented for this purpose.

Trick #5: Good translations are more than just a string thing. It also requires good layouts! Right-to-left languages are something you should be aware of. /u/lhorie on reddit mentions that Hebrew for example is written from right to left, so the entire UI needs to be right justified to look correct.

Number and Date formatting

Numbers are formatted differently in different languages! Some languages actually use a . where we in the English language use a ,. Some use spaces. How do we manage all of this? Well, that’s where standards come in to rescue us again. Honestly, I haven’t done a lot of research into which standard is the absolute best, but I’ve been using precedent that I found to determine what I think is what should be used.

The Unicode Common Locale Data Repository, or CLDR for short, is a giant information haven for common standards like language names, currency names, and a bunch of different formatting stuff. BCP47 is a subset of CLDR as far as I know. I know that for things like currency formatting, this standard is something that is used by the Angular project, so at least we know it’s trustworthy for one of the largest javascript frameworks ever.

At some point, you can’t really do more research on things before going down a rabbit hole of what has the best support for different locales. For most cases, I’ve found that using the Intl.NumberFormat library is good enough for cases relating to numbers, and moment or the Date library is good enough for date formatting. Here’s how to use those.

const formatter = new Intl.NumberFormat('de’);
formatter.format(123.456) // "123,456”
const currencyFormatter = new Intl.NumberFormat(
'de’,
{
style: 'currency’,
currency: 'EUR’
}
);
formatter.format(123.456) // "123,46 €”
const date = new Date();
date.toLocaleString('de'); // "16.1.2021, 19:21:22"

Now, in other languages, you’ll have to do your own research to determine the best, or someone probably already did the research for you and has a great open source library for you to use. You’ll definitely want to read the API to understand more.

Niche issues

In some cases, you might want to do something special and combine certain cases. For example, one niche area in currency formatting is to use the letter codes instead of the symbol, I saw someone mention this somewhere, but I never looked deep into it. Sometimes, you want the symbol in all cases, regardless of the language, or you want a user defined symbol to appear, but use the number formatting. In this case, you want to really dig into the CLDR data to figure out things like the symbol position, the cases for it, and to produce your own data.

// ¤ denotes an unspecified currency symbol
¤ 123.45
¤123.45
123.45¤
123.45 ¤

Sometimes the library can be wrong. If localization is imperative to your application, you want to keep track of updates to standards to bring it up to the rest of your team. One example I can think of is a previous version of Chromium actually did not have the correct number formatting for Norwegian. Now, this might not seem like a huge deal, most people have the latest version of Chromium anyways, right? Sure, but I’m thinking more in the case of a developer.

Puppeteer is a library that runs unit tests sometimes. It uses the latest version of headless chrome, which uses chromium. Locally, your tests for Norwegian might work. But, when you put it on a docker container for continuous integration, puppeteer relies on chromium to be installed by you through the Dockerfile. If you use a version of chromium before like 82 or something (the exact version is irrelevant), some form of Norwegian will be wrong and your pipeline will fail, and you’ll wonder what’s going on. Update chromium, and it works.

Now what I just mentioned outlines that you should have unit tests for these as well, if you want your application to be as perfect as possible. You might think that maybe you don’t have a lot of Norwegian customers, but I argue that if you’re making something, just do it right. I write unit tests for different locales based on data I pull from the CLDR repository straight up. There’s an npm package for it that makes this easy to write a script for.

Final thoughts

So those are some of the localization tricks I’ve learned from my short time in industry as well as personal research. If you have any tricks you’ve learned, or if you’ve learned something from this article, leave a comment below. I’d love to hear stories about it, because this isn’t a simple thing!

If I’ve said anything factually wrong, please let me know so I can fix it. I want to teach people to do it right, and you may actually help me learn something too that I can bring back to my actual work!

Be sure to give the video a watch if you prefer my voice explaining it, and subscribe to it to hear more coding tips, javascript tips, or my life and perspective in general! I’ll be uploading more videos to there that I may or may not make medium articles about.

--

--