Inti Tutorial: Internationalization and Localization

Tutorial

Internationalization and Localization

This section takes you step-by-step through the process of adding international support to the GNU project you built in the previous section: Building an Autotools Project. You should work through that section first, if you haven't done so, and come back to this section later. Most of the information presented here is from the GNU gettext documentation. The first two sections provide you with some important background information. The remaining sections show you how to internationalize an Inti package, and tell you how the get help with your translations.

To be useful, a program must present its messages in a language that the user can understand. Internationalization is the process of making your software support a range of languages. Localization is the process of modifying a program so that it can display its messages in an appropriately translated form. These terms are often abbreviated i18n and l10n respectively, after the number of letters between the first and last letters of the word.

For localization, GTK+/GNOME uses the GNU gettext interface. gettext works by using the strings in the original language (usually English) as the keys by which the translations are looked up. All the strings marked as needing translation are extracted from the source code with a helper program. Human translators then translate the strings into each target language.

The locale is the set of settings for the user's country and/or language. It is usually specified by a string like "en_UK". The first two letters identify the language (English) the second two the country (the United Kingdom). Included in the locale is information about things like the currency for the country and how numbers are formatted, but, more importantly, it describes the characters used for the language. The character set is the set of characters used to display the language. When storing characters in memory or on disk, a given character set may be stored in different ways - the way it is stored is termed the encoding.

Preparing Program Sources

For the programmer, changes to the source code fall into three categories. First, you have to make the localization functions known to all modules that need strings translated. Second, you should properly trigger the operation of GNU gettext when your program initializes, usually from within the main function. Last, you should identify and especially mark all constant strings in your program that need translation.

Presuming that your program, or package, has been adjusted so all needed GNU gettext files are available, and your Makefile files have been updated (see later), each Inti module having translatable strings should contain the line:

#include <inti/i18n.h>

This header contains the Inti C++ interface to GNU gettext, and is the only internationalization header you need #include in your sources.

The initialization of locale data should be done with more or less the same code in every program, as demonstrated below:

int main (int argc, char *argv[])

{

    using namespace Main;

      

    i18n::set_text_domain_dir(PACKAGE, LOCALEDIR);

    i18n::set_text_domain(PACKAGE);

      

    init(&argc, &argv);

    ....

    return


0;

}

i18n is the Inti internationalization namespace. The set_text_domain_dir() method sets the locale directory for the specified domain. The set_text_domain() method sets the translation domain for your program. PACKAGE and LOCALEDIR are preprocessor macros, and should be provided either by config.h or by the Makefile (see later).

You don't need to explicitly set the locale because GTK+ sets the locale for you when you call Main::init(). If you want to set the locale, see the GTK+ documentation for gtk_disable_setlocale().

You should modify your source files, marking all the translatable strings. Translatable strings should do the following:

They should use good English style.
Each string should be an entire sentence.
Each string should be on a single line.
A string should not use slang language or abbreviations, translators might not understand the message and may produce inappropriate translations.
A string should be limited to one paragraph; don't let a single message be longer than ten lines. It's easier to maintain the translation that way.
Use format strings instead of string concatenation; the translator might need to swap the format arguments around in the translation.

Marking strings as translatable has two goals. First, it's the trigger for retrieving the translation at run time, and second, it helps xgettext to properly extract all translatable strings when it scans a set of program sources and produces PO template files.

The canonical keyword for marking translatable strings is gettext. This keyword resolves at run time to dynamically return the proper translation, as far possible. Rather than litter sources with gettext, many programmers use a simple underscore as a keyword, and write: _("Translatable string") instead of gettext("Translatable string"). This reduces the textual overhead per translatable string to only three characters: the underscore and the two parentheses.

Most strings are found in executable positions, that is, attached to variables or given as parameters to functions. However, a special case occurs where a function call to gettext() is not allowed, such as in an array initializer. In this case N_() (N stands for no-op) is used to mark a string for translation but no translation actually occurs; it's just a marker that resolves at run time to the string. Eventually you have to call gettext() on the string to actually fetch the translation. In C both _() and N_() are macros. In Inti, _() is a function call declared in <inti/i18n.h> and N_() is a macro.

Invoking the gettextize Program

Before using gettextize you should ensure that you have recent versions of GNU m4, GNU Autoconf and GNU gettext installed on your system. Most recent Linux distributions come with these programs already installed, if you installed the development packages. Also, your project should use Autoconf and have a configure.in file.

The gettextize program is an interactive tool that helps the maintainer of a package internationalized through GNU gettext. It is used for two purposes:

As a wizard, when a package is modified to use GNU gettext for the first time.
As a migration tool, for upgrading the GNU gettext support in a package from a previous to a newer version of GNU gettext.

gettextize performs the following tasks:

It copies into the package some files that every package internationalized with GNU gettext needs.
It performs as many tasks as it can automatically.
It removes obsolete files and idioms from previous GNU gettext versions, to conform to recommended for the current GNU gettext version.
It prints a summary of the tasks that ought to be done manually and could not be done automatically by gettextize.

It can be invoked as follows:

gettextize
[ option... ] [ directory ]

and accepts the following options:

--copy:	Copy the needed files instead of making symbolic links.
--force:	Force the replacement of files which already exist.
--no-changelog:	Don't update or create a ChangeLog file.
--dry-run	Print modifications to standard output but but don't perform them.
--help	Display the help text and exit.
--version	Output version information and exit.
--intl:	Install the libintl sources in a subdirectory named <intl>. This libintl will be used to provide internationalization on systems that don't have GNU libintl installed. If this option is omitted, the call to AM_GNU_GETTEXT in configure.in should read: AM_GNU_GETTEXT([external]), and internationalization will not be enabled on systems lacking GNU gettext.

If directory is given, it should be the top level directory of the package to prepare for using GNU gettext. If not given, it's assumed that the current directory is the top level directory.

A usual invocation for gettextize would be:

$
gettextize --copy --force --intl

gettextize provides the following files and carries out the several tasks:

The ABOUT-NLS file is copied into the top-level directory of the package. This file provides information on how to install and use the Native Language Support features of your program.
A <po> directory is created that will eventually hold all translation files, but initially only contains the file <po/Makefile.in.in> from the GNU gettext distribution and a few auxiliary files.
Only if --intl was been specified will an <intl> directory be created and filled with most of the files originally in the <intl> directory of the GNU gettext distribution. If the --force option was given, the <intl> directory is emptied first.
The files config.rpath and mkinstalldirs are copied into the (top-level) directory containing the configuration support files. These files are needed by the AM_GNU_GETTEXT autoconf macro.
If the project is using GNU automake a set of autoconf macro files are copied into the package's autoconf macro repository, usually a directory called <m4>.
If your package uses symbolic links, using the -h option while creating the tar archive for your distribution will resolve each link and copy the file to the distribution archive.
gettextize will update all Makefile.am files in each affected directory, as well as the top level configure.in.
No existing file is replaced unless the --force option is specified.

One distinction between <intl> and the two other directories (m4, po) is that <intl> is meant to be identical in all packages using GNU gettext, whereas the other two directories contain mostly package dependent files.

The gettextize program makes backup files for all files it replaces or changes, and also writes ChangeLog entries about these changes. This way, the careful maintainer can check after running gettextize whether its changes are acceptable, and possibly adjust them. An exception to this rule is the <intl> directory, which is added or replaced or removed as a whole.

Internationalizing an Inti Package

With all that information on board we can now start to internationalize the HelloWorld project you built in the previous section. For this example you will need to use the files you created in the <tests/project> directory. The first thing you need to do is add a few lines to configure.in. configure.in is the input file from which Autoconf generates the configure script.

Add the following line to configure.in, just below the AC_INIT macro:

AC_CONFIG_HEADER(config.h)

The AC_CONFIG_HEADER macro indicates that you want to use a config header to define all the C preprocessor macros, and that the name of the header should be config.h.

Next, you need to enable gettext support by adding the following lines to configure.in, between the AC_PROG_CXX and AC_OUTPUT macros:

ALL_LINGUAS=""

AM_GNU_GETTEXT

AC_DEFINE_UNQUOTED(LOCALEDIR, "${prefix}/${DATADIRNAME}/locale", [Name
of gettext locale directory])

The ALL_LINGUAS variable lists all the available translations in your package. It's a whitespace separated quoted string, such as "de es fr hu". Initially there are no translations so its just an empty string.

The AM_GNU_GETTEXT macro check for internationalization support. If you didn't pass the --intl option to gettextize this macro should instead read:

AM_GNU_GETTEXT([external])

The AC_DEFINE_UNQUOTED macro defines the preprocessor macro LOCALEDIR in config.h, and computes its value.

gettextize adds <intl/Makefile> and <po/Makefile.in> to the AC_OUTPUT macro at the end of configure.in. If the macro and arguments are all on the same line you wont need to modify the additions. If the macro runs over several lines you will need to check that the syntax is still correct, after the additions. For the purposes of this example leave the AC_OUTPUT macro and its arguments on one line. If you didn't pass the --intl option to gettextize, then you don't need to add <intl/Makefile> to the AC_OUTPUT line.

After making the above additions, your configure.in script should look like this:

AC_INIT(src/main.cc)

AC_CONFIG_HEADER(config.h)

      

PACKAGE=helloworld

VERSION=0.1.0

      

AM_INIT_AUTOMAKE($PACKAGE, $VERSION)

      

INTI_REQUIRED_VERSION=1.0.6

PKG_CHECK_MODULES(INTI, inti-1.0 >= $INTI_REQUIRED_VERSION)

AC_SUBST(INTI_CFLAGS)

AC_SUBST(INTI_LIBS)

      

AC_PROG_CXX

      

ALL_LINGUAS=""

AM_GNU_GETTEXT

AC_DEFINE_UNQUOTED(LOCALEDIR, "${prefix}/${DATADIRNAME}/locale", [Name
of gettext locale directory])

      

AC_OUTPUT(Makefile src/Makefile intl/Makefile po/Makefile.in
m4/Makefile )

Note, gettextize adds <m4/Makefile> to the AC_OUTPUT in configure.in, and the m4 subdirectory to the SUBDIRS variable in Makefile.am. These are not really necessary since nothing gets compiled in the m4 subdirectory; many any maintainers remove them but don't worry about it in this example.

If you haven't suppressed the <intl> subdirectory, you need to add the GNU config.guess and config.sub and files to your package. They're needed because the <intl> directory has platform dependent support for determining the locale's character encoding, and these files are needed to identify the platform. You can obtain the newest version of config.guess and config.sub from ftp://ftp.gnu.org/pub/gnu/config. Less recent versions are also contained in the GNU automake and GNU libtool packages. You don't have to worry about adding these files to HelloWorld because the latest files are already in the <tests/project> subdirectory.

Normally, config.guess and config.sub are put in the top level directory of your package. Alternatively, you can put them in a separate <config> subdirectory, together with the other configuration support files like install-sh, ltconfig, ltmain.sh, mkinstalldirs and missing. All you need to do is to add the following line to your configure.in script:

AC_CONFIG_AUX_DIR(config)

But don't add it to your configure.in; for this example we won't worry about it.

Next, you need to make some changes to the HelloWorld sources. Insert the following line at the beginning of <src/main.cc>, so the main function can use the preprocessor macros PACKAGE and LOCALEDIR:

#include <config.h>

Remember config.h is listed in configure.in above. You will create config.h later with the Autoheader program.

Next you need to initialize the locale data. This is done by adding the following two lines to the main function, before the call to init():


i18n::set_text_domain_dir(PACKAGE, LOCALEDIR);

i18n::set_text_domain(PACKAGE);

i18n is the Inti internationalization namespace. The set_text_domain_dir() method sets the locale directory for the specified domain. The set_text_domain() method sets the translation domain for your package.

After making the above changes, the <src/main.cc> file should look like this:

#include <config.h>

      #include "helloworld.h"

      

      int main (int argc, char *argv[])

{

    using namespace Main;

      

    i18n::set_text_domain_dir(PACKAGE, LOCALEDIR);

    i18n::set_text_domain(PACKAGE);

      

    init(&argc, &argv);

      

    HelloWorld window;

   
window.sig_destroy().connect(slot(&Inti::Main::quit));

    window.show();

      

    run();

    return 0;

}

Inti provides a convenient C++ wrapper for the GNU gettext interface in the header <inti/i18n.h>. This is the only internationalization header that you need to #include in your program.

Add the following #include to <src/helloworld.h>:

#include <inti/i18n.h>

Now you have to mark the translatable strings in the sources. In HelloWorld only one file will contain translatable strings: <src/helloworld.cc>.

Change line 8 in <src/helloworld.cc> to read:

Gtk::Button
*button = new
Gtk::Button(_("Click Me"));

and change line 21 to read:

std::cout
<< _("The button was clicked.") << std::endl;

In the lines above, the calls _("Click Me") and _(

"The button was
clicked.") marks both those strings for translation.

Now you're ready to call gettextize so execute the following shell command:

$
gettextize --copy --force --intl

The --copy option copies the files into the source tree instead of using symbolic links. The --intl option copies the libintl sources in a subdirectory named <intl> for use on systems that don't provide gettext(). The --force option overwrites existing files.

Next you need to make a few changes and add a new file.

First, add the <po> subdirectory to the SUBDIRS variable in the top-level Makefile.am, so that it reads:

SUDDIRS
= intl m4 po src

You could remove m4 because it's not really needed, but don't worry about it here.

In the <po> subdirectory change the name of the file Makevars.template to Makevars. Also in the <po> subdirectory create the text file POTFILES.in and add the following lines to it and save the file:

#
List of source files containing translatable strings.

      

src/helloworld.cc

Not much left to go! Now you need to call Autoheader to create config.h, and then you need to rerun aclocal to add the contents of the <m4> directory to aclocal.m4.

Execute the following two shell commands:

$
autoheader

$ aclocal -I m4

Now rerun Autoconf. Then run configure, make and install to check that HelloWorld compiles and installs alright .

$
autoconf

$ ./configure

$ make

$ make-install

Remember in the previous section you created an autogen.sh file to regenerate the project's output files after editing any input files. You can now add gettextize to this file so that the internationalization files also get updated:

Your autogen.sh file should now look like this:

#! /bin/sh

aclocal \
&& automake --add-missing \
&& autoconf \
&& gettextize --copy --force --intl

Creating the PO Template File

After preparing your sources by marking all translatable strings you need to create a PO template file, using the xgettext program. xgettext creates a file named domainname.po. You need to change its name to domainname.pot. Why doesn't xgettext create it under the name domainname.pot right away? The answer is: for historical reasons. When xgettext was specified, the distinction between a PO file and PO file template was fuzzy, and the suffix .pot wasn't in use at that time.

Before you create the PO template file there is one thing you need to do first. I don't know why, but when POTFILES is created automatically from POTFILES.in it inserts whitespace at the beginning of each line, before the file name. xgettext doesn't skip over this whitespace, and so looks for a file name that includes the whitespace. Of course xgettext doesn't find it and so it reports an error. You will have to manually remove all the whitespace from the beginning of each line in POTFILES before running xgettext.

There are a lot of options that can be passed to xgettext so I suggest you read the GNU gettext documentation, its very thorough. If you invoke xgettext from the <po> subdirectory the command line is simplified somewhat.

Execute the following shell command from the <po> subdirectory:

xgettext
--files-from=POTFILES --default-domain=helloworld --keyword=_

xgettext parses the specified input file POTFILES, and creates the output file helloworld.po. If it can't find any translatable strings in the sources no PO file will be created. You can specify the --force-po option to force xgettext to create an empty PO file when no translatable strings are found.

The --default-domain option specifies the default translation domain for the package, in this case helloworld. Remember, you specified the domain name in the main function with a call to i18n::set_text_domain().

The --keyword option is important. It specifies that an alternate keyword is being used to mark translatable strings. In Inti this should always be an underscore.

Before doing anything else rename helloworld.po to helloworld.pot. This POT file is your project's PO template file. When starting a new translation, the translator creates a file called LANG.po, as a copy of the domainname.pot template file. For example, de.po for a German translation or fr.po for a French translation (or c3.po for a cyborg translation).

The GNOME Translation Project

The GNOME Translation Project is a project devoted to helping you with your translations. The way it works is that you contact the gnome-i18n mailing list to find out how the translators can access your <po> subdirectory, and to add your project to the big status tables. Then you update the POTFILES.in file in your <po> subdirectory so that the translators always have access to updated domainname.pot files. Then, simply freeze the strings at least a couple of days before you make a new release, and announce it on gnome-i18n. Depending on the number of translatable strings in your program, and how popular it is, translations will then start to appear in your <po> subdirectory as LANG.po files.

It's not easy to get translation work done before your package gets internationalized and available! Since the cycle has to start somewhere, the easiest thing to do is start with absolutely no PO files, and wait until various translator teams get interested in your package, and submit PO files. Most language teams only consist of 1-3 persons, so if your program contains a lot of strings, it might take a while before anyone has the time to look at it. Also, most translators don't want to waste their time on unstable and poorly maintained packages, so they may decide to spend their time on some other project.

For the Translation Project to work smoothly, it is important that project maintainers do not get involved in translation concerns, and that translators be kept as free as possible of programming concerns. The only concern maintainers should have is marking new strings as translatable, when they should be, and do not worry about them being translated, as this will come in due course.

Also, it's important for translators and maintainers to understand that package translation is a continuous process over the lifetime of a package, and not something which is done once and for all at the start. After an initial burst of translation activity for a given package, interventions are needed once in a while, because here and there, translated entries become obsolete, and new untranslated entries appear, needing translation.

Some Helpful Links

There are a couple of sections you should look at in the GNU gettext documentation. Section 3: "Preparing Program Sources" covers the ins and outs of marking translatable strings very well. You should also look at section 12.6: "Integrating with CVS".

« Building an GNU Autotools Project

Index
Top

String Handling »