60

I am developing the UI for a .NET MVC application that will require international localization of all content in the near future. I am very familiar with .NET in general but have never had a project that required such a significant focus on international accessibility.

The projected is initially being done in English. What measures should I take at this point to make it easier to implement localization in the future?

52

Some basic things you should take into account:

Externalize all string resources

All your resources should be contained in external files that can be handed off for localization. Don't forget about error messages, if you want these localized too.

Allow sufficient space for string expansion

Strings in some languages tend to be up to 30% longer (like Greek) for example, so ensure that you design your UI in such a way so that strings can expand if necessary. Here's a rather extreme example for French:

Ok -> Accepter (French - 400% expansion)

I'd recommend doing some kind of pseudo translation as a starting point (http://en.wikipedia.org/wiki/Pseudolocalization). Or you could translate your resources via Google Translate or Bing. This will give you a good indication of what actual translations will look like.

Watch out for text in images

If you use any images in your application - ensure they don't contain any text - this obviously cannot be translated.

Never hardcode any paths to Windows folders

Obvious, but I've seen it in the past. For example, C:\Program Files is translated on some international versions of Windows, e.g. it's C:\Programme on a German OS.

Avoid using locale specific terms

For example, if you ask someone for their 'High School' on a form, this has little meaning in western Europe.

Avoid creating strings via string concatenation

For example, this looks harmless:

strWelcome = ReadExternalString("Welcome"); 
strMessage = strWelcome + ", " + UserName;

But, the word order on Japanese for example would be different, so this may end up not making any sense.

Time / Date Settings

Always ensure to get the time / date format form the OS.

29

You are developing ASP.Net MVC application, are you? Other answers seem to be specific to desktop applications. Let me capture common things:

Locale detection

It is quite important that your application detect user's locale correctly. In desktop application, CultureInfo.CurrentCulture holds preferred formatting locale (the one that should be used to format numbers, dates, currencies, etc.) whereas CultureInfo.CurrentUICulture holds preferred User Interface locale (the one that should be used to display localized messages). For web applications, you should set both cultures to auto (to automatically detect locale from AcceptLanguage header) unless you want to implement some fancy locale detection workflow (i.e. want to support changing language on demand).

Externalize strings

All strings should come from resources, that is Resx files. In Winforms App it is easily achievable by setting form Localizable property to true. You would also need to manually (unfortunately) externalize strings that come from your models. It is also relatively simple. In Asp.Net you would need to externalize everything manually...

Layouts

You definitely need to allow for string expansion. In Winforms world it is achievable via TableLayoutPanel which should be used to make sure that layout will adjust automatically to accommodate longer text. In web world, you are a bit out of luck. You might need to implement CSS Localization Mechanism - a way to modify (override) CSS definitions. This would allow Localization folks to modify style issues on demand. Make sure that each HTML element in rendered page has unique id - it will allow to target it precisely.

Culture specific issues

Avoid using graphics, colors and sounds that might be specific for western culture. If you really need it, please provide means of Localization. Avoid direction-sensitive graphics (as this would be a problem when you try to localize to say Arabic or Hebrew). Also, do not assume that whole world is using the same numbers (i.e. not true for Arabic).

ToString() and Parse()

Be sure to always pass CultureInfo when calling ToString() unless it is not supported. That way you are commenting your intents. For example: if you are using some number internally and for some reason need to convert it to string use:

int i = 42;
var s = i.ToString(CultureInfo.InvariantCulture);

For numbers that are going to be displayed to user use:

var s = i.ToString(CultureInfo.CurrentCulture); // formatting culture used

The same applies to Parse(), TryParse() and even ParseExact() - some nasty bugs could be introduced without proper use of CultureInfo. That is because some poor soul in Microsoft, full of good intentions decided that it is a good idea to treat CultureInfo.CurrentCulture as default one (it would be used if you don't pass anything) - after all when somebody is using ToString() he/she want to display it to user, right? Turn out it is not always the case - for example try to store your application version number in database and then convert it to instance of Version class. Good luck.

Dates and time zones

Be sure to always store and instantiate DateTime in UTC (use DateTime.UtcNow instead DateTime.Now). Convert it to local time in local format upon displaying:

DateTime now = DateTime.UtcNow;
var s = now.ToLocalTime().ToString(CultureInfo.CurrentCulture);

If you need to send emails with time reference in body, be sure to include time zone information - include both UTC offset and list of cities:

DateTime someDate; // i.e. from database
var formattedDate = String.Format("{0} {1}", 
             someDate.ToLocaleTime().ToString(CultureInfo.CurrentCulture),
             TimeZoneInfo.Local.DisplayName);

Compound messages

You already have been warned not to concatenate strings. Instead you would probably use String.Format() as shown above. However, I must state that you should minimize use of compound messages. That is just because target grammar rules are quite commonly different, so translators might need not only to re-order the sentence (this would be resolved by using placeholders and String.Format()), but translate the whole sentence in different way based on what will be substituted. Let me give you some examples:

// Multiple plural forms
English: 4 viruses found.
Polish: Znaleziono 4 wirusy. **OR** Znaleziono 5 wirusów.

// Conjugation
English: Program encountered incorrect character | Application encountered incorrect character.
Polish: Program napotka? nieznan? liter? | Aplikacja napotka?a nieznan? liter?.

Other concatenation issues

Concatenation is not restricted to strings. Avoid laying out controls together, say:

Remind me again in [text box with number] days.

This should be re-designed to something like: Remind me again in this number of days: [text box].

Character encoding and fonts

Always save, transfer, whatever text in Unicode (i.e. in UTF-8). Do not hard-code fonts - Localization might need to modify them and it will turn off default font fall-back mechanism (in case of Winforms). Remember to allow "strange" characters in most fields (i.e. user name).

Test

You will probably need to implement so called pseudo translation, that is create resources for say German culture and copy your English strings adding prefix and suffix. You may also wrap placeholders to easily detect compound strings. The purpose of pseudo translation is to detect Localizability issues like hard-coded strings, layout issues and excessive use of compound messages.

11

Special Considerations for Asian Languages

In addition to all the great answers already here, some watch-out's for Asian languages:

Beware of different lengths of text

Chinese and Korean text tends to be much shorter than the equivalent English text (since you usually need fewer blocky characters to write the same thing), so a page may actually look empty in Chinese but jammed full in German... You need to do some dynamic sizing here to look good.

However, Japanese text usually tends to be much longer, even longer than the equivalent English text in terms of character count.

Beware of baseline layout and the "slided up" look

Asian characters are usually laid out on the baseline, which do not include descenders (i.e. the lower part of y, g, q, j etc.) When you format a screen element -- usually buttons -- with text inside, and if that text is only Asian languages (i.e. no Western alphabets), then the text will look like it is shifted upwards.

Formatting of numbers and localized numeric units

Handle number formatting differently. Different Asian countries have different ways of formatting numbers. Same with currencies. For example, in East Asia, 10,000 (wan) is a common unit. In India, 100,000 (lakhs) is common.

Local currencies

Some countries' currencies have a lot of zeros and no decimal point (e.g. Japan, Indonesia, Italy), while others have up to two digits after the decimal point.

Beware of different word orders

Word order may not always be the same. Best to do use {0}, {1} etc. in string formatting instead of hard-coding word order if your string comes from a combination of different pieces of data.

Use locale-specific sort

Sorting is different per language and per locale -- you should always rely on the O/S's locale-specific sort.

Be very cautious with full-width/half-width characters

Beware of the differences between "full-width" and "half-width" characters. Brackets, punctuation etc. can have "full-width" versions that are different from standard ASCII. If you do searching or string splitting based on these letters, you'll need to first convert all full-width symbols to half-width equivalents.

A period is not a dot... a comma is not a comma...

Beware of data input gotcha's -- for example, in Chinese, a period is not a dot ".". A comma is full-width, not ",". Don't try to search for western punctuation if the user doing the data-entry may accidentally turn on Asian language IME.

Phone numbers

Don't assume anything in phone number formatting. There is not always an area code etc. and it can be formatted differently. Usually, have a format string per country.

Don't assume people will only have one mobile phone number, or one fax number etc. It is not this way in Asia.

Addresses -- denser than you may think

For addresses, don't assume anything. There may not always be a zip code. Zip codes may not always be numbers. A country may not have provinces/states. A country may be just a big city (e.g. Singapore). For certain Asian countries, the smallest unit of a home may be "Room X, Unit Y, Section Z, Floor A, Block B, Group C, Estate D". In general, be very liberal in number of fields and number of characters allowed in addresses.

Salutations

Salutations are not only restricted to Mr., Mrs. etc. Although you're probably safe in using "M" and "F" for sex -- we are not that wierd yet...

6

Some basic steps are to make sure any string that is displayed on the screen is not a literal in your code. If you're doing Winforms each form will have a UI resource. For dialogs, reports etc, make sure you use the project resource files.

So instead of "Upload failed" in your code, you might have something like Resources.UploadFailed

This way you can create a new resource file for each language you use (and .Net will help with this.) And have the localized string in each file.

EDIT I forgot to mention when you're doing your UI, make sure you don't just cram things in there. Depending on the languages you are localizing to, real estate could be a problem. I worked on a project that had German and Portuguese as the 2 biggest offenders for string growth. If we weren't careful strings that were fine in English, French and Italian would blow up in German.

4

In addition to the specific how to load resources I'd make sure that you test with a pseudo-localized version to begin with. Otherwise you're not likely to notice the places where internationalization considerations were omitted until the end.

3

I suggest you run FXCop or Visual Studio Code Analysis (they're quite the same) on your assemblies.

They are good at detecting .NET code that does not use the proper culture oriented overloads, like this one: CA1305: Specify IFormatProvider.

I must add that these tools are also frustrating because they usually detect zillions of issues in your code, but still, even if you don't follow each rule, you should learn a lot.

2

You need to consider:

  1. Routing for multilanguage

  2. Move all hardcode string to resource file

An example for a property:

Model:

[Display(Name = <Resource for display name>.<field for this property>)]
[Required(ErrorMessage = <Resource for error message>.<field for this validate message>)]
public string TestProperty { get; set; }

View:

@Html.LabelFor(m=>m.TestProperty)
@Html.EditorFor(m => m.TestProperty)
@Html.ValidationMessageFor(m => m.TestProperty)
2

In addition to all the other helpful hints, here are some that are missing:

Take into account that some countries use more than one language. For example, in Canada, a user would expect to be able to easily switch between English and French.

If you ask the user a question that expects a single letter answer, don't expect the user to press the 'Y' key to say Yes.

Be very aware in Stored procs that dates in the SQL DB are in USA format

Placing text strings in the DB allows you to later add additional languages without redeploying.

When sending written text files for translation, always include a description of context to ensure the translator selects the correct word. Eg without context, you could translate "pitch:" into something to do with sound or a place on which you play soccer

Address labels always need converting. Province in Canada, State in America, County in the UK

0

The most important thing is managing the content in various languages. I have developed couple of websistes myself and managing the content in various language is the biggest challenge.

I am using Database to store the resources/content. It gives me the flexibility of adding any language support I want. I have implemented the logic of falling back to the english language if a resource in particular language is not found.

You can later use a translator to convert the english value into any language.