perllocale - Perl locale handling (internationalization and localization)
Perl supports language-specific notions of data such as "is this a letter",
"what is the uppercase equivalent of this letter", and "which of these letters
comes first". These are important issues, especially for languages other than
English--but also for English: it would be naïve to imagine that A-Za-z defines
all the "letters" needed to write in English. Perl is also aware that some character
other than '.' may be preferred as a decimal point, and that output date representations may
be language-specific. The process of making an application take account of its users'
preferences in such matters is called internationalization (often abbreviated as i18n);
telling such an application about a particular set of preferences is known as localization
(l10n).
Perl can understand language-specific data via the standardized (ISO C, XPG4, POSIX 1.c)
method called "the locale system". The locale system is controlled per application
using one pragma, one function call, and several environment variables.
NOTE: This feature is new in Perl 5.004, and does not apply unless an application
specifically requests it--see Backward compatibility.
The one exception is that write() now always uses the current locale - see "NOTES".
If Perl applications are to understand and present your data correctly according a locale
of your choice, all of the following must be true:
- Your operating system must support the locale system. If it does, you should find
that the setlocale() function is a documented part of its C library.
- Definitions for locales that you use must be installed. You, or your system
administrator, must make sure that this is the case. The available locales, the location
in which they are kept, and the manner in which they are installed all vary from system to
system. Some systems provide only a few, hard-wired locales and do not allow more to be
added. Others allow you to add "canned" locales provided by the system supplier.
Still others allow you or the system administrator to define and add arbitrary locales.
(You may have to ask your supplier to provide canned locales that are not delivered with
your operating system.) Read your system documentation for further illumination.
- Perl must believe that the locale system is supported. If it does,
perl -V:d_setlocale
will say that the value for d_setlocale is define.
If you want a Perl application to process and present your data according to a particular
locale, the application code should include the use locale pragma (see The use locale pragma) where appropriate, and at least
one of the following must be true:
- The locale-determining environment variables (see "ENVIRONMENT")
must be correctly set up at the time the application is started, either by yourself or
by whoever set up your system account.
- The application must set its own locale using the method described in The setlocale function.
Versions of Perl prior to 5.004 mostly ignored locale information, generally
behaving as if something similar to the "C" locale were always in
force, even if the program environment suggested otherwise (see The setlocale function). By default, Perl still behaves
this way for backward compatibility. If you want a Perl application to pay attention to locale
information, you must use the use locale pragma (see The use locale pragma) to instruct it to do so.
Versions of Perl from 5.002 to 5.003 did use the LC_CTYPE information if
available; that is, \w did understand what were the letters according to the
locale environment variables. The problem was that the user had no control over the feature:
if the C library supported locales, Perl used them.
In versions of Perl prior to 5.004, per-locale collation was possible using the I18N::Collate
library module. This module is now mildly obsolete and should be avoided in new applications.
The LC_COLLATE functionality is now integrated into the Perl core language: One
can use locale-specific scalar data completely normally with use locale, so there
is no longer any need to juggle with the scalar references of I18N::Collate.
Comparing and sorting by locale is usually slower than the default sorting; slow-downs of
two to four times have been observed. It will also consume more memory: once a Perl scalar
variable has participated in any string comparison or sorting operation obeying the locale
collation rules, it will take 3-15 times more memory than before. (The exact multiplier
depends on the string's contents, the operating system and the locale.) These downsides are
dictated more by the operating system's implementation of the locale system than by Perl.
Formats are the only part of Perl that unconditionally use information from a program's
locale; if a program's environment specifies an LC_NUMERIC locale, it is always used to
specify the decimal point character in formatted output. Formatted output cannot be controlled
by use locale because the pragma is tied to the block structure of the program,
and, for historical reasons, formats exist outside that block structure.
There is a large collection of locale definitions at ftp://dkuug.dk/i18n/WG15-collection .
You should be aware that it is unsupported, and is not claimed to be fit for any purpose. If
your system allows installation of arbitrary locales, you may find the definitions useful as
they are, or as a basis for the development of your own locales.
"Internationalization" is often abbreviated as i18n because its first and
last letters are separated by eighteen others. (You may guess why the internalin ...
internaliti ... i18n tends to get abbreviated.) In the same way, "localization" is
often abbreviated to l10n.
Internationalization, as defined in the C and POSIX standards, can be criticized as
incomplete, ungainly, and having too large a granularity. (Locales apply to a whole process,
when it would arguably be more useful to have them apply to a single thread, window group, or
whatever.) They also have a tendency, like standards groups, to divide the world into nations,
when we all know that the world can equally well be divided into bankers, bikers, gamers, and
so on. But, for now, it's the only standard we've got. This may be construed as a bug.
The support of Unicode is new starting from Perl version 5.6, and more fully implemented in
the version 5.8. See perluniintro
and perlunicode for more
details.
Usually locale settings and Unicode do not affect each other, but there are exceptions, see
perlunicode/Locales
for examples.
In certain systems, the operating system's locale support is broken and cannot be fixed or
used by Perl. Such deficiencies can and will result in mysterious hangs and/or Perl core dumps
when the use locale is in effect. When confronted with such a system, please
report in excruciating detail to <perlbug@perl.org>, and complain to your vendor:
bug fixes may exist for these problems in your operating system. Sometimes such bug fixes are
called an operating system upgrade.
I18N::Langinfo, perluniintro, perlunicode, open, POSIX/isalnum, POSIX/isalpha, POSIX/isdigit, POSIX/isgraph, POSIX/islower, POSIX/isprint, POSIX/ispunct, POSIX/isspace, POSIX/isupper, POSIX/isxdigit, POSIX/localeconv, POSIX/setlocale, POSIX/strcoll, POSIX/strftime, POSIX/strtod, POSIX/strxfrm.
Jarkko Hietaniemi's original perli18n.pod heavily hacked by Dominic Dunlop, assisted
by the perl5-porters. Prose worked over a bit by Tom Christiansen.
Last update: Thu Jun 11 08:44:13 MDT 1998
|
|