Website hosting service by Active-Venture.com
  

 Back to Index

LOCALE CATEGORIES

The following subsections describe basic locale categories. Beyond these, some combination categories allow manipulation of more than one basic category at a time. See "ENVIRONMENT" for a discussion of these.

Category LC_COLLATE: Collation

In the scope of use locale, Perl looks to the LC_COLLATE environment variable to determine the application's notions on collation (ordering) of characters. For example, 'b' follows 'a' in Latin alphabets, but where do 'á' and 'å' belong? And while 'color' follows 'chocolate' in English, what about in Spanish?

The following collations all make sense and you may meet any of them if you "use locale".

 
	A B C D E a b c d e
	A a B b C c D d E e
	a A b B c C d D e E
	a b c d e A B C D E  

Here is a code snippet to tell what "word" characters are in the current locale, in that locale's order:

 
        use locale;
        print +(sort grep /\w/, map { chr } 0..255), "\n";  

Compare this with the characters that you see and their order if you state explicitly that the locale should be ignored:

 
        no locale;
        print +(sort grep /\w/, map { chr } 0..255), "\n";  

This machine-native collation (which is what you get unless use locale has appeared earlier in the same block) must be used for sorting raw binary data, whereas the locale-dependent collation of the first example is useful for natural text.

As noted in USING LOCALES, cmp compares according to the current collation locale when use locale is in effect, but falls back to a char-by-char comparison for strings that the locale says are equal. You can use POSIX::strcoll() if you don't want this fall-back:

 
        use POSIX qw(strcoll);
        $equal_in_locale =
            !strcoll("space and case ignored", "SpaceAndCaseIgnored");  

$equal_in_locale will be true if the collation locale specifies a dictionary-like ordering that ignores space characters completely and which folds case.

If you have a single string that you want to check for "equality in locale" against several others, you might think you could gain a little efficiency by using POSIX::strxfrm() in conjunction with eq:

 
        use POSIX qw(strxfrm);
        $xfrm_string = strxfrm("Mixed-case string");
        print "locale collation ignores spaces\n"
            if $xfrm_string eq strxfrm("Mixed-casestring");
        print "locale collation ignores hyphens\n"
            if $xfrm_string eq strxfrm("Mixedcase string");
        print "locale collation ignores case\n"
            if $xfrm_string eq strxfrm("mixed-case string");  

strxfrm() takes a string and maps it into a transformed string for use in char-by-char comparisons against other transformed strings during collation. "Under the hood", locale-affected Perl comparison operators call strxfrm() for both operands, then do a char-by-char comparison of the transformed strings. By calling strxfrm() explicitly and using a non locale-affected comparison, the example attempts to save a couple of transformations. But in fact, it doesn't save anything: Perl magic (see perlguts/Magic Variables) creates the transformed version of a string the first time it's needed in a comparison, then keeps this version around in case it's needed again. An example rewritten the easy way with cmp runs just about as fast. It also copes with null characters embedded in strings; if you call strxfrm() directly, it treats the first null it finds as a terminator. don't expect the transformed strings it produces to be portable across systems--or even from one revision of your operating system to the next. In short, don't call strxfrm() directly: let Perl do it for you.

Note: use locale isn't shown in some of these examples because it isn't needed: strcoll() and strxfrm() exist only to generate locale-dependent results, and so always obey the current LC_COLLATE locale.

Category LC_CTYPE: Character Types

In the scope of use locale, Perl obeys the LC_CTYPE locale setting. This controls the application's notion of which characters are alphabetic. This affects Perl's \w regular expression metanotation, which stands for alphanumeric characters--that is, alphabetic, numeric, and including other special characters such as the underscore or hyphen. (Consult perlre for more information about regular expressions.) Thanks to LC_CTYPE, depending on your locale setting, characters like 'æ', 'ð', 'ß', and 'ø' may be understood as \w characters.

The LC_CTYPE locale also provides the map used in transliterating characters between lower and uppercase. This affects the case-mapping functions--lc(), lcfirst, uc(), and ucfirst(); case-mapping interpolation with \l, \L, \u, or \U in double-quoted strings and s/// substitutions; and case-independent regular expression pattern matching using the i modifier.

Finally, LC_CTYPE affects the POSIX character-class test functions--isalpha(), islower(), and so on. For example, if you move from the "C" locale to a 7-bit Scandinavian one, you may find--possibly to your surprise--that "|" moves from the ispunct() class to isalpha().

Note: A broken or malicious LC_CTYPE locale definition may result in clearly ineligible characters being considered to be alphanumeric by your application. For strict matching of (mundane) letters and digits--for example, in command strings--locale-aware applications should use \w inside a no locale block. See "SECURITY".

Category LC_NUMERIC: Numeric Formatting

In the scope of use locale, Perl obeys the LC_NUMERIC locale information, which controls an application's idea of how numbers should be formatted for human readability by the printf(), sprintf(), and write() functions. String-to-numeric conversion by the POSIX::strtod() function is also affected. In most implementations the only effect is to change the character used for the decimal point--perhaps from '.' to ','. These functions aren't aware of such niceties as thousands separation and so on. (See The localeconv function if you care about these things.)

Output produced by print() is also affected by the current locale: it depends on whether use locale or no locale is in effect, and corresponds to what you'd get from printf() in the "C" locale. The same is true for Perl's internal conversions between numeric and string formats:

 
        use POSIX qw(strtod);
        use locale;

        $n = 5/2;   # Assign numeric 2.5 to $n

        $a = " $n"; # Locale-dependent conversion to string

        print "half five is $n\n";       # Locale-dependent output

        printf "half five is %g\n", $n;  # Locale-dependent output

        print "DECIMAL POINT IS COMMA\n"
            if $n == (strtod("2,5"))[0]; # Locale-dependent conversion  

See also I18N::Langinfo and RADIXCHAR.

Category LC_MONETARY: Formatting of monetary amounts

The C standard defines the LC_MONETARY category, but no function that is affected by its contents. (Those with experience of standards committees will recognize that the working group decided to punt on the issue.) Consequently, Perl takes no notice of it. If you really want to use LC_MONETARY, you can query its contents--see The localeconv function--and use the information that it returns in your application's own formatting of currency amounts. However, you may well find that the information, voluminous and complex though it may be, still does not quite meet your requirements: currency formatting is a hard nut to crack.

See also I18N::Langinfo and CRNCYSTR.

LC_TIME

Output produced by POSIX::strftime(), which builds a formatted human-readable date/time string, is affected by the current LC_TIME locale. Thus, in a French locale, the output produced by the %B format element (full month name) for the first month of the year would be "janvier". Here's how to get a list of long month names in the current locale:

 
        use POSIX qw(strftime);
        for (0..11) {
            $long_month_name[$_] =
                strftime("%B", 0, 0, 0, 1, $_, 96);
        }  

Note: use locale isn't needed in this example: as a function that exists only to generate locale-dependent results, strftime() always obeys the current LC_TIME locale.

See also I18N::Langinfo and ABDAY_1..ABDAY_7, DAY_1..DAY_7, ABMON_1..ABMON_12, and ABMON_1..ABMON_12.

Other categories

The remaining locale category, LC_MESSAGES (possibly supplemented by others in particular implementations) is not currently used by Perl--except possibly to affect the behavior of library functions called by extensions outside the standard Perl distribution and by the operating system and its utilities. Note especially that the string value of $! and the error messages given by external utilities may be changed by LC_MESSAGES. If you want to have portable error codes, use %!. See Errno.

 

  

 

Domain name registration & domain search - 
Register cheap domain name from $7.95 and enjoy free domain services 
 

Cheap domain name search service -
Domain name services at just
$8.95/year only
 

Register domain name -
Buy domain name registration and cheap domain transfer at low, affordable price.

© 2002-2004 Active-Venture.com Web Site Hosting Service

 

[ Don't get suckered in by the comments— they can be terribly misleading. Debug only code.   ]

 

 
 
 

Disclaimer: This documentation is provided only for the benefits of our web hosting customers.
For authoritative source of the documentation, please refer to http://www.perldoc.com