perlebcdic - Considerations for running Perl on EBCDIC platforms
An exploration of some of the issues facing Perl programmers on EBCDIC based computers. We
do not cover localization, internationalization, or multi byte character set issues other than
some discussion of UTF-8 and UTF-EBCDIC.
Portions that are still incomplete are marked with XXX.
Most socket programming assumes ASCII character encodings in network byte order. Exceptions
can include CGI script writing under a host web server where the server may take care of
translation for you. Most host web servers convert EBCDIC data to ISO-8859-1 or Unicode on
output.
To the extent that it is possible to write code that depends on hashing order there may be
differences between hashes as stored on an ASCII based machine and hashes stored on an EBCDIC
based machine. XXX
Internationalization(I18N) and localization(L10N) are supported at least in principle even
on EBCDIC machines. The details are system dependent and discussed under the perlebcdic/OS ISSUES
section below.
Perl may work with an internal UTF-EBCDIC encoding form for wide characters on EBCDIC
platforms in a manner analogous to the way that it works with the UTF-8 internal encoding form
on ASCII based platforms.
Legacy multi byte EBCDIC code pages XXX.
This pod document contains literal Latin 1 characters and may encounter translation
difficulties. In particular one popular nroff implementation was known to strip accented
characters to their unaccented counterparts while attempting to view this document through the
pod2man program (for example, you may see a plain y rather than one with a
diaeresis as in ˙). Another nroff truncated the resultant manpage at the first occurrence of
8 bit characters.
Not all shells will allow multiple -e string arguments to perl to be
concatenated together properly as recipes 0, 2, 4, 5, and 6 might seem to imply.
perllocale, perlfunc, perlunicode, utf8.
http://anubis.dkuug.dk/i18n/charmaps
http://www.unicode.org/
http://www.unicode.org/unicode/reports/tr16/
http://www.wps.com/texts/codes/ ASCII: American Standard Code for Information
Infiltration Tom Jennings, September 1999.
The Unicode Standard, Version 3.0 The Unicode Consortium, Lisa Moore ed., ISBN
0-201-61633-5, Addison Wesley Developers Press, February 2000.
CDRA: IBM - Character Data Representation Architecture - Reference and Registry, IBM
SC09-2190-00, December 1996.
"Demystifying Character Sets", Andrea Vine, Multilingual Computing &
Technology, #26 Vol. 10 Issue 4, August/September 1999; ISSN 1523-0309; Multilingual
Computing Inc. Sandpoint ID, USA.
Codes, Ciphers, and Other Cryptic and Clandestine Communication Fred B. Wrixon, ISBN
1-57912-040-7, Black Dog & Leventhal Publishers, 1998.
http://www.bobbemer.com/P-BIT.HTM IBM - EBCDIC and the P-bit; The biggest Computer Goof
Ever Robert Bemer.
15 April 2001: added UTF-8 and UTF-EBCDIC to main table, pvhp.
Peter Prymmer pvhp@best.com wrote this in 1999 and 2000 with CCSID 0819 and 0037 help from
Chris Leach and André Pirard A.Pirard@ulg.ac.be as well as POSIX-BC help from Thomas Dorner
Thomas.Dorner@start.de. Thanks also to Vickie Cooper, Philip Newton, William Raffloer, and Joe
Smith. Trademarks, registered trademarks, service marks and registered service marks used in
this document are the property of their respective owners.
|