You can use the box below to create new pages for this mini-wiki.
Welcome to The Internationalization Wiki
Internationalization is the process to make software products functional and conform to international business manner by having one code line for all languages/regions. This approach is much more efficient compared to the traditional approach, which produces a code line per language, as known as localization.
While the specifications and the terms are well organized in the standard organization web sites and/or Wikipedia, internationalization methodologies/tools, which are useful for software development, are not organized well.
Internationalization Wiki focuses on gathering internationalization methodologies/tools/samples used by internationalization architects/consultants.
Internationalization does not mean only translating software products but also making them functionable with all languages data and conform to international business manner. In order to make it happen, there are various kinds of things need to be considered in the development cycle. Guidelines lists the guidelines for development process, architecture review, coding, QA, localization(translation).
Many people believe they should release their products in their HQ's country first and do the internationalization afterwards. That is a very bad and expensive approach in the long run if you have a business plan to sell your products internationally. You can take that approach only if you don't mind rearchitecting your products after the release. The most important thing in internationalizing software is that you take it into every single phase of your development process just like you do so for other features of your products or security support. It is very important to plan internationalization check points in your development process.
From technical perspective, architecture including technology choices, is most important for internationalization. If you architect your products properly, your work will be a lot easier later on. This idea should also be applied to internationalization. Especially if you are thinking of multilingual support in your server products, you should carefully review the architecture so you don't have to change data structures, locale management and so on late in your development phase. Please keep it in your mind that multilingual support always requires the architecture thoughts.
Although the modern development platforms, e.g. Java, .Net and etc, support Unicode and provide useful internationalized libraries, it is still necessary for you to take care of some internationalization issues like locale management, locale aware formatting, transcoding character encodings and etc. It is encouraged to establish internationalization coding standards for your major technology stacks.
Many people believe that simply having native speakers for internationalization testing can assure the quality of internationalization. If you believe you can assure the quality of your products by having the interns make test plans, then this would be true for you. (That's your product quality anyway!) Otherwise, this is quite wrong. Just like you do so for generic functions, test plans for internationalization should be made or reviewed by someone, who understands your products and internationalization. Having native speakers, who don't know much internationalization, can only assure the quality of translation.
Please note you can talk about localization or translation only if you plan development process and do architecture right. Many people try to start thinking of translation first when a business case comes from abroad. But it simply does not work. Unless the products function correctly with users' data, providing translation does not make any sense for them. (In the worst case, translated UIs would be garbled as well.)
Localization has two faces. One is the face for the development and two is the face for the translators. The development side of work like packaging files to translate should better be done by the development team or fully automated. Otherwise, you will need to have someone, who understands the build process and take care of some technical issues. And it is not cost effective. You should avoid such just because your release engineering is a little nervous about localization. It should be a small work for the release engineering. If it is not, you probably have done something wrong in either development process or architecture.
As for the communications with the translators, you should better have someone dedicated to that if you have enough translation volume. The person will need to negotiate with the translation vendors for the price and follow up their questions and their progress to ensure that you can release the products on time. Especially if you plan to translate your products to multiple languages, it will be a lot of work.
Understanding character sets/encodings is one of most important things for internalization. Especially understanding Unicode is crucial in the modern internationalization architecture. Category:I18N Character Set lists the pages related to character sets and/or encodings.
Character set and encoding
Before you get into any of those pages, it is worth taking your time to understand the difference between character set and character encoding. It is important especially to understand Unicode and Unicode encodings. In short, A character set simply provides a common set of characters and has nothing to do with a numeric value to represent a character. A character encoding is the process to map a character to a numeric value.
It is still confusing, isn't it? You don't understand why it is important? Here is a problem with Unicode for example. Unicode character set simply defines a common set of characters, which cover all languages. And Unicode specification defines the variant encodings such as UTF-16, UTF-8, UTF-7 and etc. If you don't understand this difference, you will most likely get very confused with the variant encodings supporting Unicode character set. And you will make mistakes in implementing Unicode support.
For example, Java supports Unicode character set and it uses UTF-16 encoding internally. But UTF-16 encoding is not favored by HTTP and you need to send some data in UTF-8 over HTTP connection in many cases. If you don't understand the difference and you assume Unicode character set and UTF-16 are identical, you may send some data in UTF-16 over HTTP connection without any encoding conversion from UTF-16 to UTF-8. Then your implementation would not work. This is because a numeric value of a character is different between them. (e.g. Latin capitcal letter A is mapped to '0x0041', two bytes, in UTF-16 while it is mapped to '0x41', one byte, in UTF-8.)
So, please keep this difference in your mind while you go through those pages.