Tutorial
|
|
|
|
Internationalization and Localization
- Preparing Program
Sources
- Invoking the gettextize
Program
- Internationalizing an
Inti Package
- Creating the PO
Template File
- The GNOME Translation
Project
- Some Helpful Links
This section takes you step-by-step through the process of adding
international support to the GNU project you built in the previous
section:
Building
an Autotools Project. You should work through that section
first, if you haven't done so, and come back to this section later. Most
of the information presented here is from the GNU gettext documentation.
The first two sections provide you with some important background
information. The remaining sections show you how to internationalize an
Inti package, and tell you how the get help with your translations.
To be useful, a program must present its messages in a language that
the user can understand.
Internationalization
is the process of making your software support a range of languages.
Localization is the process of
modifying a program so that it can display its messages in an
appropriately translated form. These terms are often abbreviated
i18n and
l10n respectively, after the number
of letters between the first and last letters of the word.
For localization, GTK+/GNOME uses the GNU gettext interface.
gettext works by using the strings in
the original language (usually English) as the keys by which the
translations are looked up. All the strings marked as needing
translation are extracted from the source code with a helper program.
Human translators then translate the strings into each target language.
The
locale is the set of
settings for the user's country and/or language. It is usually specified
by a string like "en_UK". The first two letters identify the language
(English) the second two the country (the United Kingdom). Included in
the locale is information about things like the currency for the country
and how numbers are formatted, but, more importantly, it describes the
characters used for the language. The character set is the set of
characters used to display the language. When storing characters in
memory or on disk, a given character set may be stored in different
ways - the way it is stored is termed the
encoding.
Preparing
Program Sources
For the programmer, changes to the
source code fall into three categories. First, you have to make the
localization functions known to all modules that need strings
translated. Second, you should properly trigger the operation of GNU
gettext when your program initializes, usually from within the main
function. Last, you should identify and especially mark all constant
strings in your program that need translation.
Presuming that your program, or package, has been adjusted so all
needed GNU gettext files are available, and your Makefile files have
been updated (see later), each Inti module having translatable strings
should contain the line:
This header contains the Inti C++ interface to GNU gettext, and
is the only internationalization header you need #include in your
sources.
The initialization of locale data should be done with more or less the
same code in every program, as demonstrated below:
int main (int argc, char *argv[])
{
using namespace Main;
i18n::set_text_domain_dir(PACKAGE, LOCALEDIR);
i18n::set_text_domain(PACKAGE);
init(&argc, &argv);
....
return
0;
} |
i18n is the Inti
internationalization namespace. The set_text_domain_dir() method sets
the
locale directory for the specified domain. The
set_text_domain() method sets the translation domain for your program.
PACKAGE and LOCALEDIR are preprocessor macros, and should be provided
either by config.h or by the Makefile (see later).
You don't need to explicitly set the
locale because GTK+ sets the locale for you when you call Main::init().
If you want to set the locale, see the GTK+ documentation for
gtk_disable_setlocale().
You should modify your source files, marking all the translatable
strings. Translatable strings should do the following:
- They should use good English style.
- Each string should be an entire sentence.
- Each string should be on a single line.
- A string should not use slang language or abbreviations,
translators might not understand the message and may produce
inappropriate translations.
- A string should be limited to one paragraph; don't let a single
message be longer than ten lines. It's easier to maintain the
translation that way.
- Use format strings instead of string concatenation; the
translator might need to swap the format arguments around in the
translation.
Marking strings as translatable has two goals. First, it's the trigger
for retrieving the translation at run time, and second, it helps xgettext to properly extract all
translatable strings when it scans a set of program sources and produces
PO template files.
The canonical keyword for marking translatable strings is gettext. This keyword resolves at run
time to dynamically return the proper translation, as far possible.
Rather than litter sources with gettext, many programmers use a simple
underscore as a keyword, and write: _("Translatable string") instead of
gettext("Translatable string"). This reduces the textual overhead per
translatable string to only three characters: the underscore and the two
parentheses.
Most strings are found in executable positions, that is, attached to
variables or given as parameters to functions. However, a special case
occurs where a function call to gettext() is not allowed, such as in an
array initializer. In this case N_() (N stands for no-op) is used to
mark a string for translation but no translation actually occurs; it's
just a marker that resolves at run time to the string. Eventually you
have to call gettext() on the string to actually fetch the translation.
In C both _() and N_() are macros. In Inti, _() is a function call
declared in <inti/i18n.h> and N_() is a macro.
Invoking
the gettextize Program
Before using gettextize you should
ensure that you have recent versions of GNU m4, GNU Autoconf and GNU
gettext installed on your system. Most recent Linux distributions come
with these programs already installed, if you installed the development
packages. Also, your project should use Autoconf and have a configure.in file.
The gettextize program is an
interactive tool that helps the maintainer of a package
internationalized through GNU gettext. It is used for two purposes:
- As a wizard, when a package is modified to use GNU gettext for
the first time.
- As a migration tool, for upgrading the GNU gettext support in a
package from a previous to a newer version of GNU gettext.
gettextize performs the following tasks:
- It copies into the package some files that every package
internationalized with GNU gettext needs.
- It performs as many tasks as it can automatically.
- It removes obsolete files and idioms from previous GNU gettext
versions, to conform to recommended for the current GNU gettext version.
- It prints a summary of the tasks that ought to be done manually
and could not be done automatically by gettextize.
It can be invoked as follows:
gettextize
[ option... ] [ directory ] |
and accepts the following options:
--copy: |
Copy the needed files instead of
making symbolic links. |
--force: |
Force the replacement of files
which already exist. |
--no-changelog: |
Don't update or create a
ChangeLog file. |
--dry-run |
Print modifications to standard
output but but don't perform them. |
--help |
Display the help text and exit. |
--version |
Output version information and
exit. |
--intl: |
Install the libintl sources in a
subdirectory named <intl>. This libintl will be used to provide
internationalization on systems that don't have GNU libintl installed.
If this option is omitted, the call to AM_GNU_GETTEXT in configure.in
should read: AM_GNU_GETTEXT([external]), and internationalization will
not be enabled on systems lacking GNU gettext. |
If directory is given, it
should be the top level directory of the package to prepare for using
GNU gettext. If not given, it's assumed that the current directory is
the top level directory.
A usual invocation for gettextize would be:
$
gettextize --copy --force --intl
|
gettextize provides the following files and carries out the several
tasks:
- The ABOUT-NLS file is copied into the top-level directory of the
package. This file provides information on how to install and use the
Native Language Support features of your program.
- A <po> directory is created that will eventually hold all
translation files, but initially only contains the file
<po/Makefile.in.in> from the GNU gettext distribution and a few
auxiliary files.
- Only if --intl was been specified will an <intl>
directory be created and filled with most of the files originally in the
<intl> directory of the GNU gettext distribution. If the --force
option was given, the <intl> directory is emptied first.
- The files config.rpath and mkinstalldirs are
copied into the (top-level) directory containing the configuration
support files. These files are needed by the AM_GNU_GETTEXT autoconf
macro.
- If the project is using GNU automake a set of autoconf
macro files are copied into the package's autoconf macro repository,
usually a directory called <m4>.
- If your package uses symbolic links, using the -h option while
creating the tar archive for your distribution will resolve each link
and copy the file to the distribution archive.
- gettextize will update all Makefile.am files in each affected
directory, as well as the top level configure.in.
- No existing file is replaced unless the --force option
is specified.
One distinction between <intl>
and the two other directories (m4, po) is that <intl> is meant to
be identical in all packages using GNU gettext, whereas the other two
directories contain mostly package dependent files.
The gettextize program makes backup files for all files it replaces or
changes, and also writes ChangeLog entries about these changes. This
way, the careful maintainer can check after running gettextize whether
its changes are acceptable, and possibly adjust them. An exception to
this rule is the <intl> directory, which is added or replaced or
removed as a whole.
Internationalizing
an Inti Package
With all that information on board we
can now start to internationalize the HelloWorld project you built in
the previous section. For this example you will need to use the files
you created in the <tests/project> directory. The first thing you
need to do is add a few lines to configure.in.
configure.in is the input file from which Autoconf generates the
configure script.
Add the following line to configure.in, just below the AC_INIT macro:
AC_CONFIG_HEADER(config.h)
|
The AC_CONFIG_HEADER macro indicates
that you want to use a config header to define all the C preprocessor
macros, and that the name of the header should be config.h.
Next, you need to enable gettext support by adding the following lines
to configure.in, between the AC_PROG_CXX and AC_OUTPUT macros:
ALL_LINGUAS=""
AM_GNU_GETTEXT
AC_DEFINE_UNQUOTED(LOCALEDIR, "${prefix}/${DATADIRNAME}/locale", [Name
of gettext locale directory])
|
The ALL_LINGUAS variable lists all
the available translations in your package. It's a whitespace separated
quoted string, such as "de es fr hu". Initially there are no
translations so its just an empty string.
The AM_GNU_GETTEXT macro check for internationalization support. If you
didn't pass the --intl option to gettextize this macro should
instead read:
AM_GNU_GETTEXT([external]) |
The AC_DEFINE_UNQUOTED macro defines the preprocessor macro LOCALEDIR
in config.h, and computes its value.
gettextize adds <intl/Makefile>
and <po/Makefile.in> to the AC_OUTPUT macro at the end of
configure.in. If the macro and arguments are all on the same line you
wont need to modify the additions. If the macro runs over several lines
you will need to check that the syntax is still correct, after the
additions. For the purposes of this example leave the AC_OUTPUT macro
and its arguments on one line. If you didn't pass the --intl
option to gettextize, then you don't need to add <intl/Makefile>
to the AC_OUTPUT line.
After making the above additions, your configure.in script should look
like this:
AC_INIT(src/main.cc)
AC_CONFIG_HEADER(config.h)
PACKAGE=helloworld
VERSION=0.1.0
AM_INIT_AUTOMAKE($PACKAGE, $VERSION)
INTI_REQUIRED_VERSION=1.0.6
PKG_CHECK_MODULES(INTI, inti-1.0 >= $INTI_REQUIRED_VERSION)
AC_SUBST(INTI_CFLAGS)
AC_SUBST(INTI_LIBS)
AC_PROG_CXX
ALL_LINGUAS=""
AM_GNU_GETTEXT
AC_DEFINE_UNQUOTED(LOCALEDIR, "${prefix}/${DATADIRNAME}/locale", [Name
of gettext locale directory])
AC_OUTPUT(Makefile src/Makefile intl/Makefile po/Makefile.in
m4/Makefile )
|
Note, gettextize adds
<m4/Makefile> to the AC_OUTPUT in configure.in, and the m4
subdirectory to the SUBDIRS variable in Makefile.am. These are not
really necessary since nothing gets compiled in the m4 subdirectory;
many any maintainers remove them but don't worry about it in this
example.
If you haven't suppressed the
<intl> subdirectory, you need to add the GNU
config.guess and
config.sub and files to your
package. They're needed because the <intl> directory has platform
dependent support for determining the locale's character encoding, and
these files are needed to identify the platform. You can obtain the
newest version of config.guess and config.sub from
ftp://ftp.gnu.org/pub/gnu/config.
Less recent versions are also contained in the GNU automake and GNU
libtool packages. You don't have to worry about adding these files
to HelloWorld because the latest files are already in the
<tests/project> subdirectory.
Normally, config.guess and config.sub are put in the top level
directory of your package. Alternatively, you can put them in a separate
<config> subdirectory, together with the other configuration
support files like
install-sh, ltconfig, ltmain.sh,
mkinstalldirs and
missing. All you need to do is to add
the following line to your configure.in script:
AC_CONFIG_AUX_DIR(config)
|
But don't add it to your configure.in; for this example we won't worry
about it.
Next, you need to make some changes to the HelloWorld sources. Insert
the following line at the beginning of <src/main.cc>, so the main
function can use the preprocessor macros PACKAGE and LOCALEDIR:
Remember config.h is listed in configure.in above. You will
create config.h later with the Autoheader program.
Next you need to initialize the locale data. This is done by adding the
following two lines to the main function, before the call to init():
i18n::set_text_domain_dir(PACKAGE, LOCALEDIR);
i18n::set_text_domain(PACKAGE);
|
i18n is the Inti internationalization namespace. The
set_text_domain_dir() method sets the locale directory for the
specified domain. The set_text_domain() method sets the translation
domain for your package.
After making the above changes, the <src/main.cc> file should
look like this:
#include <config.h>
#include "helloworld.h"
int main (int argc, char *argv[])
{
using namespace Main;
i18n::set_text_domain_dir(PACKAGE, LOCALEDIR);
i18n::set_text_domain(PACKAGE);
init(&argc, &argv);
HelloWorld window;
window.sig_destroy().connect(slot(&Inti::Main::quit));
window.show();
run();
return 0;
}
|
Inti provides a convenient C++ wrapper for the GNU gettext interface in
the header <inti/i18n.h>. This is the only internationalization
header that you need to #include in your program.
Add the following #include to <src/helloworld.h>:
Now you have to mark the translatable strings in the sources. In
HelloWorld only one file will contain translatable strings:
<src/helloworld.cc>.
Change line 8 in <src/helloworld.cc> to read:
Gtk::Button
*button = new
Gtk::Button(_("Click Me"));
|
and change line 21 to read:
std::cout
<< _("The button was clicked.") << std::endl; |
In the lines above, the calls _("Click Me") and _("The button was
clicked.") marks both those strings for translation.
Now you're ready to call gettextize so execute the following shell
command:
$
gettextize --copy --force --intl |
The --copy option copies the files into the source tree
instead of using symbolic links. The --intl option copies the libintl
sources in a subdirectory named <intl> for use on systems that
don't provide gettext(). The --force option overwrites existing
files.
Next you need to make a few changes and add a new file.
First, add the <po> subdirectory to the SUBDIRS variable in the
top-level Makefile.am, so that it reads:
You could remove m4 because it's not really needed, but don't worry
about it here.
In the <po> subdirectory change the name of the file Makevars.template to Makevars. Also in the <po> subdirectory
create the text file POTFILES.in
and add the following lines to it and save the file:
#
List of source files containing translatable strings.
src/helloworld.cc
|
Not much left to go! Now you need to call Autoheader to create
config.h, and then you need to rerun aclocal to add the contents of the
<m4> directory to aclocal.m4.
Execute the following two shell commands:
$
autoheader
$ aclocal -I m4 |
Now rerun Autoconf. Then run configure, make and install to check that
HelloWorld compiles and installs alright .
$
autoconf
$ ./configure
$ make
$ make-install |
Remember in the previous section you created an autogen.sh file to regenerate the
project's output files after editing any input files. You can now add
gettextize to this file so that the internationalization files also get
updated:
Your autogen.sh file should now look like this:
#!
/bin/sh
aclocal \
&& automake --add-missing \
&& autoconf \
&& gettextize --copy --force --intl |
Creating
the PO Template File
After preparing your sources by
marking all translatable strings you need to create a PO template file,
using the xgettext program.
xgettext creates a file named domainname.po.
You need to change its name to domainname.pot.
Why doesn't xgettext create it under the name domainname.pot right
away? The answer is: for historical reasons. When xgettext was
specified, the distinction between a PO file and PO file template was
fuzzy, and the suffix .pot wasn't in use at that time.
Before you create the PO template file there is one thing you need to
do first. I don't know why, but when POTFILES is created automatically
from POTFILES.in it inserts whitespace at the beginning of each line,
before the file name. xgettext doesn't skip over this whitespace, and so
looks for a file name that includes the whitespace. Of course xgettext
doesn't find it and so it reports an error. You will have to manually
remove all the whitespace from the beginning of each line in POTFILES
before running xgettext.
There are a lot of options that can be passed to xgettext so I suggest
you read the GNU gettext documentation, its very thorough. If you invoke
xgettext from the <po> subdirectory the command line is simplified
somewhat.
Execute the following shell command from the <po> subdirectory:
xgettext
--files-from=POTFILES --default-domain=helloworld --keyword=_ |
xgettext parses the specified input
file POTFILES, and creates the output file helloworld.po. If it can't find any
translatable strings in the sources no PO file will be created. You can
specify the --force-po option to force xgettext to create an
empty PO file when no translatable strings are found.
The --default-domain option specifies the default translation
domain for the package, in this case helloworld. Remember, you specified
the domain name in the main function with a call to i18n::set_text_domain().
The --keyword option is important. It specifies that an alternate
keyword is being used to mark translatable strings. In Inti this should
always be an underscore.
Before doing anything else rename helloworld.po to helloworld.pot. This POT file is
your project's PO template file. When starting a new translation, the
translator creates a file called LANG.po, as a copy of the
domainname.pot template file. For example, de.po for a German
translation or fr.po for a French translation (or c3.po for a
cyborg translation).
The GNOME
Translation Project
The GNOME Translation Project is a
project devoted to helping you with your translations. The way it works
is that you contact the
gnome-i18n
mailing list to find out how the translators can access your <po>
subdirectory, and to add your project to the big
status tables.
Then you update the POTFILES.in file in your <po> subdirectory so
that the translators always have access to updated domainname.pot files.
Then, simply freeze the strings at least a couple of days before you
make a new release, and announce it on gnome-i18n. Depending on the
number of translatable strings in your program, and how popular it is,
translations will then start to appear in your <po> subdirectory
as LANG.po files.
It's not easy to get translation work done before your package gets
internationalized and available! Since the cycle has to start somewhere,
the easiest thing to do is start with absolutely no PO files, and wait
until various translator teams get interested in your package, and
submit PO files. Most language teams only consist of 1-3 persons, so if
your program contains a lot of strings, it might take a while before
anyone has the time to look at it. Also, most translators don't want to
waste their time on unstable and poorly maintained packages, so they may
decide to spend their time on some other project.
For the Translation Project to work smoothly, it is important that
project maintainers do not get involved in translation concerns, and
that translators be kept as free as possible of programming concerns.
The only concern maintainers should have is marking new strings as
translatable, when they should be, and do not worry about them being
translated, as this will come in due course.
Also, it's important for translators and maintainers to understand that
package translation is a continuous process over the lifetime of a
package, and not something which is done once and for all at the start.
After an initial burst of translation activity for a given package,
interventions are needed once in a while, because here and there,
translated entries become obsolete, and new untranslated entries appear,
needing translation.
Some
Helpful Links
There are a couple of sections you
should look at in the
GNU gettext
documentation. Section 3: "Preparing Program Sources" covers the
ins and outs of marking translatable strings very well. You should also
look at section 12.6: "Integrating with CVS".