FOSS A General Introduction (Localization and Internationalization)

From OODA WIKI
FOSS Introduction.jpg

Table of Contents

Localization and Internationalization

What is localization? What is internationalization?

According to the Localization Institute,

Localization is the process of creating or adapting a product to a specific locale, i.e., to the language, cultural context, conventions and market requirements of a specific target market. With a properly localized product, a user can interact with this product using his/her own language and cultural conventions. It also means that all user-visible messages and all user documentation (printed and electronic) use the language and cultural conventions of the user. Finally, the properly localized product meets all regulatory and other requirements of the user’s country/region.

Internationalization is a way of designing and producing products that can be easily adapted to different locales. This requires extracting all language, country/region and culturally dependent elements from a product. In other words, the process of developing an application whose feature design and code design do not make assumptions based on a single locale, and whose source code simplifies the creation of different local editions of a program, is called internationalization.[1]

What is an example of localization and internationalization?

Localization and internationalization are often used interchangeably. The definitions provided above with reference to software development clearly show the distinction. In terms of FOSS development, an excellent example of both ‘internationalization’ and ‘localization’ is the Mozilla Project. Mozilla is the most well known and widely used of the FOSS web browsers available. Mozilla is internationalized because the community of developers behind the Mozilla Project have designed and developed their software to function in multiple locales. Mozilla is localized when local developers, using guidelines and localization toolkits provided by the Mozilla Project, modify or adapt the product to suit a particular locale. This modification often involves translating user interfaces, documentation and packaging, as well as changing and customizing features to match the usage patterns of that locale.

Internationalization and localization of Mozilla by anyone is possible because it is a FOSS project. The Mozilla source code is distributed under the Mozilla Public License (MPL), which is a license that is based on and approved by the Open Source Initiative. The Mozilla Project aims to serve the greater Internet community, which it recognizes as a global community made up of users belonging to a great array of language groups. One of the goals of the Mozilla Project is to “advocate the localization of mozilla.org products into any world language”.

Fully localized versions of Mozilla cover 34 different languages. Localization efforts are still continuing for most of the other languages.[2]

What are the methods of localizing GNU/Linux?

Prof. Venkatesh Hariharan

"The localisation of Linux to Indian languages can spark off a revolution that reaches down to the grassroots levels of the country," [3]

For each different locale or country, the challenges involved in localizing GNU/Linux vary. Some locales may find that localization requires minimal effort. Other locales may find that localization requires extensive modification and customized programming. This depends largely on the similarity between the locale’s requirements and the requirements already localized in GNU/Linux.

There are many different methods used to localize GNU/Linux, using different encoding, input and display systems. At present, the most technically effective method is localization via the Linux-Unicode-OpenType model. A brief explanation of the different technologies follows.

Unicode ( http://www.unicode.org )
The Unicode encoding system, the latest version being Unicode 4.0, is an industry standard for encoding characters and symbols. It is closely related to the ISO Universal Character Set standard 10646. Additions to either standard are coordinated between the ISO and the Unicode Consortium. The Unicode Consortium, co-founded by Apple and Xerox in 1991, now has more than 100 members, including Adobe, IBM, Microsoft, Sybase, Compaq, Hewlett Packard, Oracle, Sun Microsystems, Netscape and Ericsson.
The aim of Unicode and ISO 10646 is to encompass all of the languages of the world, with each character code corresponding to a ‘glyph’. Combinations of character codes produce combined glyphs for complex characters (particularly in the Asian languages). The initial Unicode standard specified an encoding for 16-bit characters, which allows for a total of 65,535 possible characters/symbols. Later versions of the standard have expanded the encoding to a 32-bit range, allowing over one million different characters and symbols to be encoded.
The Unicode standard is more and more relevant in light of accelerated globalization. It is the most relevant encoding system for the Internet. As Internet penetration continues to increase in both developing and developed countries, the benefits of integrating Unicode in software and content development cannot be ignored.
OpenType ( http://www.adobe.com/type/opentype/main.html )
Fonts are at the ‘front end’ of localization and often receive the most attention from non-technical observers. Thus, font development is very often seen as the be-all and end-all of localization. However, font development is only one crucial component of the entire localization process, although it is the most visible.
Just as we are advocating the Unicode encoding system, we are advocating OpenType font file formats as the appropriate standard for font development in localization efforts.
OpenType is a cross-platform font file format jointly developed by Microsoft and Adobe. It is based on the Unicode encoding standard and offers multiple language character sets in one font file. Whereas traditional Western Postscript fonts are limited to 256 glyphs, an OpenType font may contain more than 65,000 glyphs, allowing multiple languages to be displayed using a single font.

Using the Linux-Unicode-OpenType model, most localization efforts involve the following steps:

  • Unicode standard corrections/enhancements
  • Font development
  • Input methods
  • Modifying applications to handle local language characteristics
  • Translating application messages
  • Ensuring that changes are accepted by the global FOSS community

Unicode standard corrections/enhancements

Creating encoding that adequately handles the needs of the countless languages throughout the world is highly complex. The immensity of this task has resulted in errors and inadequacies in the specification of certain languages, particularly languages from countries that have low levels of ICT development. Additionally, while Unicode may have included encoding for all of the major languages in the world, encoding for the other languages and dialects (India alone has over 1,000 languages and dialects) is either incomplete or non-existent. In countries where the existing Unicode standard is lacking, a review of the existing Unicode standard and recommendation of changes to the Unicode Consortium will be necessary.

Font development

Once a satisfactory Unicode standard has been developed, the next challenge is ensuring that there is a freely available, cross-platform font. Without fonts, it is impossible to display, use and manipulate any language electronically. Modern fonts, particularly OpenType fonts, are more than just the visual representation of a language. OpenType fonts contain the logic behind the display of the words, how glyphs interact with and change surrounding glyphs. Languages that differ greatly from the western alphabet (Arabic, Laotian, Dzongkha, etc.) often do not have a commonly available, non-proprietary font.

Font development is no small task. A high-quality, professional font can take several years to develop.

Input methods

The next step involves standardizing and implementing a system for input in that language. The most common input method in computing is via the keyboard and many countries have created mappings between the standard keys to characters in their local language. These are often ad hoc adoptions and several are used within a country. For example, there are several keyboard layouts in use regularly in Bangladesh. The lack of a single standard is a result of and contributes further to incompatible implementations of character sets/encoding, keyboard mappings, fonts, and the like. Addressing and standardizing input methods from the outset provides developers with a common starting point.

Once an input method has been standardized, software has to be written to implement the standard under GNU/Linux. If the number of characters is less than the possible key combinations, this becomes a simple task of remapping the keys on a keyboard. It is when the number of characters far outnumber the keys on a keyboard (e.g., Chinese with its 30,000 characters) that more advanced techniques become necessary.

Modify applications to handle local language characteristics

While most major FOSS applications have been internationalized, some modification may still be necessary to adapt to local language characteristics. For example, most word processors break words on a space but in languages that do not use spaces, special rules must be created to specify breaking order. Similar problems exist with word sorting, text flow and other issues. Most languages will require minimal modification but certain languages may require extensive modification to applications.

Additionally, locale-specific information such as date format, currency symbols and other issues has to be specified. This is normally a simple task involving editing text files.

Translating application messages

The next step in localizing GNU/Linux involves the translation of messages that the application passes to the user. Messages such as “File Not Found” or “Operation Complete” have to be translated to the local language. This task involves very little technical skill as the messages are normally stored in text files for easy viewing and editing. However, translating the thousands of messages and help files is an undertaking that can take several years to complete and is often the slowest part of the localization process. Even if the task is limited to the most commonly used applications (web browser, office productivity suite), significant effort has to be expended.

Ensuring that changes are accepted by the global FOSS community

One of the major advantages of the FOSS development method is that maintenance costs are often shared among the various users of the software. However, this is possible only if the changes made are accepted by the global community. Localization may involve changes in many different software components, each maintained by different project teams. Therefore, there should be a focused effort to ensure that all changes made are accepted by each of the teams, often by ensuring that the changes are made in a manner compatible with the future direction of the project team. In essence, one must be a player in the global team effort from the very start or risk being the only one left maintaining an isolated version of GNU/Linux.

Footnotes

  1. “The Localization Institute” [home page online]; available from http://www.localizationinstitute.com/switchboard.cfm?page=terminology; Internet; accessed on November 9, 2003.
  2. “MLP – Ongoing Localization Projects” [home page online]; available from http://www.mozilla.org/projects/l10n/mlp_status.html#contrib; Internet; accessed on November 9, 2003.
  3. Available from http://www.medialabasia.org/news/news_top2.html; Internet; accessed on May 20, 2003.