Website hosting service by Active-Venture.com
  

 Back to Index

TRANSFORMATION FORMATS

There are a variety of ways of transforming data with an intra character set mapping that serve a variety of purposes. Sorting was discussed in the previous section and a few of the other more popular mapping techniques are discussed next.

URL decoding and encoding

Note that some URLs have hexadecimal ASCII code points in them in an attempt to overcome character or protocol limitation issues. For example the tilde character is not on every keyboard hence a URL of the form:

 
    http://www.pvhp.com/~pvhp/  

may also be expressed as either of:

 
    http://www.pvhp.com/%7Epvhp/

    http://www.pvhp.com/%7epvhp/  

where 7E is the hexadecimal ASCII code point for '~'. Here is an example of decoding such a URL under CCSID 1047:

 
    $url = 'http://www.pvhp.com/%7Epvhp/';
    # this array assumes code page 1047
    my @a2e_1047 = (
          0,  1,  2,  3, 55, 45, 46, 47, 22,  5, 21, 11, 12, 13, 14, 15,
         16, 17, 18, 19, 60, 61, 50, 38, 24, 25, 63, 39, 28, 29, 30, 31,
         64, 90,127,123, 91,108, 80,125, 77, 93, 92, 78,107, 96, 75, 97,
        240,241,242,243,244,245,246,247,248,249,122, 94, 76,126,110,111,
        124,193,194,195,196,197,198,199,200,201,209,210,211,212,213,214,
        215,216,217,226,227,228,229,230,231,232,233,173,224,189, 95,109,
        121,129,130,131,132,133,134,135,136,137,145,146,147,148,149,150,
        151,152,153,162,163,164,165,166,167,168,169,192, 79,208,161,  7,
         32, 33, 34, 35, 36, 37,  6, 23, 40, 41, 42, 43, 44,  9, 10, 27,
         48, 49, 26, 51, 52, 53, 54,  8, 56, 57, 58, 59,  4, 20, 62,255,
         65,170, 74,177,159,178,106,181,187,180,154,138,176,202,175,188,
        144,143,234,250,190,160,182,179,157,218,155,139,183,184,185,171,
        100,101, 98,102, 99,103,158,104,116,113,114,115,120,117,118,119,
        172,105,237,238,235,239,236,191,128,253,254,251,252,186,174, 89,
         68, 69, 66, 70, 67, 71,156, 72, 84, 81, 82, 83, 88, 85, 86, 87,
        140, 73,205,206,203,207,204,225,112,221,222,219,220,141,142,223
    );
    $url =~ s/%([0-9a-fA-F]{2})/pack("c",$a2e_1047[hex($1)])/ge;  

Conversely, here is a partial solution for the task of encoding such a URL under the 1047 code page:

 
    $url = 'http://www.pvhp.com/~pvhp/';
    # this array assumes code page 1047
    my @e2a_1047 = (
          0,  1,  2,  3,156,  9,134,127,151,141,142, 11, 12, 13, 14, 15,
         16, 17, 18, 19,157, 10,  8,135, 24, 25,146,143, 28, 29, 30, 31,
        128,129,130,131,132,133, 23, 27,136,137,138,139,140,  5,  6,  7,
        144,145, 22,147,148,149,150,  4,152,153,154,155, 20, 21,158, 26,
         32,160,226,228,224,225,227,229,231,241,162, 46, 60, 40, 43,124,
         38,233,234,235,232,237,238,239,236,223, 33, 36, 42, 41, 59, 94,
         45, 47,194,196,192,193,195,197,199,209,166, 44, 37, 95, 62, 63,
        248,201,202,203,200,205,206,207,204, 96, 58, 35, 64, 39, 61, 34,
        216, 97, 98, 99,100,101,102,103,104,105,171,187,240,253,254,177,
        176,106,107,108,109,110,111,112,113,114,170,186,230,184,198,164,
        181,126,115,116,117,118,119,120,121,122,161,191,208, 91,222,174,
        172,163,165,183,169,167,182,188,189,190,221,168,175, 93,180,215,
        123, 65, 66, 67, 68, 69, 70, 71, 72, 73,173,244,246,242,243,245,
        125, 74, 75, 76, 77, 78, 79, 80, 81, 82,185,251,252,249,250,255,
         92,247, 83, 84, 85, 86, 87, 88, 89, 90,178,212,214,210,211,213,
         48, 49, 50, 51, 52, 53, 54, 55, 56, 57,179,219,220,217,218,159
    );
    # The following regular expression does not address the 
    # mappings for: ('.' => '%2E', '/' => '%2F', ':' => '%3A') 
    $url =~ s/([\t "#%&\(\),;<=>\?\@\[\\\]^`{|}~])/sprintf("%%%02X",$e2a_1047[ord($1)])/ge;  

where a more complete solution would split the URL into components and apply a full s/// substitution only to the appropriate parts.

In the remaining examples a @e2a or @a2e array may be employed but the assignment will not be shown explicitly. For code page 1047 you could use the @a2e_1047 or @e2a_1047 arrays just shown.

uu encoding and decoding

The u template to pack() or unpack() will render EBCDIC data in EBCDIC characters equivalent to their ASCII counterparts. For example, the following will print "Yes indeed\n" on either an ASCII or EBCDIC computer:

 
    $all_byte_chrs = '';
    for (0..255) { $all_byte_chrs .= chr($_); }
    $uuencode_byte_chrs = pack('u', $all_byte_chrs);
    ($uu = <<'ENDOFHEREDOC') =~ s/^\s*//gm;
    M``$"`P0%!@<("0H+#`T.#Q`1$A,4%187&!D:&QP='A\@(2(C)"4F)R@I*BLL
    M+2XO,#$R,S0U-C<X.3H[/#T^/T!!0D-$149'2$E*2TQ-3D]045)35%565UA9
    M6EM<75Y?8&%B8V1E9F=H:6IK;&UN;W!Q<G-T=79W>'EZ>WQ]?G^`@8*#A(6&
    MAXB)BHN,C8Z/D)&2DY25EI>8F9J;G)V>GZ"AHJ.DI::GJ*FJJZRMKJ^PL;*S
    MM+6VM[BYNKN\O;Z_P,'"P\3%QL?(R<K+S,W.S]#1TM/4U=;7V-G:V]S=WM_@
    ?X>+CY.7FY^CIZNOL[>[O\/'R\_3U]O?X^?K[_/W^_P``
    ENDOFHEREDOC
    if ($uuencode_byte_chrs eq $uu) {
        print "Yes ";
    }
    $uudecode_byte_chrs = unpack('u', $uuencode_byte_chrs);
    if ($uudecode_byte_chrs eq $all_byte_chrs) {
        print "indeed\n";
    }  

Here is a very spartan uudecoder that will work on EBCDIC provided that the @e2a array is filled in appropriately:

 
    #!/usr/local/bin/perl
    @e2a = ( # this must be filled in
           );
    $_ = <> until ($mode,$file) = /^begin\s*(\d*)\s*(\S*)/;
    open(OUT, "> $file") if $file ne "";
    while(<>) {
        last if /^end/;
        next if /[a-z]/;
        next unless int(((($e2a[ord()] - 32 ) & 077) + 2) / 3) ==
            int(length() / 4);
        print OUT unpack("u", $_);
    }
    close(OUT);
    chmod oct($mode), $file;
  

Quoted-Printable encoding and decoding

On ASCII encoded machines it is possible to strip characters outside of the printable set using:

 
    # This QP encoder works on ASCII only
    $qp_string =~ s/([=\x00-\x1F\x80-\xFF])/sprintf("=%02X",ord($1))/ge;  

Whereas a QP encoder that works on both ASCII and EBCDIC machines would look somewhat like the following (where the EBCDIC branch @e2a array is omitted for brevity):

 
    if (ord('A') == 65) {    # ASCII
        $delete = "\x7F";    # ASCII
        @e2a = (0 .. 255)    # ASCII to ASCII identity map
    }
    else {                   # EBCDIC
        $delete = "\x07";    # EBCDIC
        @e2a =               # EBCDIC to ASCII map (as shown above)
    }
    $qp_string =~
      s/([^ !"\#\$%&'()*+,\-.\/0-9:;<>?\@A-Z[\\\]^_`a-z{|}~$delete])/sprintf("=%02X",$e2a[ord($1)])/ge;  

(although in production code the substitutions might be done in the EBCDIC branch with the @e2a array and separately in the ASCII branch without the expense of the identity map).

Such QP strings can be decoded with:

 
    # This QP decoder is limited to ASCII only
    $string =~ s/=([0-9A-Fa-f][0-9A-Fa-f])/chr hex $1/ge;
    $string =~ s/=[\n\r]+$//;  

Whereas a QP decoder that works on both ASCII and EBCDIC machines would look somewhat like the following (where the @a2e array is omitted for brevity):

 
    $string =~ s/=([0-9A-Fa-f][0-9A-Fa-f])/chr $a2e[hex $1]/ge;
    $string =~ s/=[\n\r]+$//;  

Caesarian ciphers

The practice of shifting an alphabet one or more characters for encipherment dates back thousands of years and was explicitly detailed by Gaius Julius Caesar in his Gallic Wars text. A single alphabet shift is sometimes referred to as a rotation and the shift amount is given as a number $n after the string 'rot' or "rot$n". Rot0 and rot26 would designate identity maps on the 26 letter English version of the Latin alphabet. Rot13 has the interesting property that alternate subsequent invocations are identity maps (thus rot13 is its own non-trivial inverse in the group of 26 alphabet rotations). Hence the following is a rot13 encoder and decoder that will work on ASCII and EBCDIC machines:

 
    #!/usr/local/bin/perl

    while(<>){
        tr/n-za-mN-ZA-M/a-zA-Z/;
        print;
    }  

In one-liner form:

 
    perl -ne 'tr/n-za-mN-ZA-M/a-zA-Z/;print'
  

 

  

 

Domain name registration & domain search - 
Register cheap domain name from $7.95 and enjoy free domain services 
 

Cheap domain name search service -
Domain name services at just
$8.95/year only
 

Register domain name -
Buy domain name registration and cheap domain transfer at low, affordable price.

© 2002-2004 Active-Venture.com Web Site Hosting Service

 

[ A printer consists of three main parts: the case, the jammed paper tray and the blinking red light.   ]

 

 
 
 

Disclaimer: This documentation is provided only for the benefits of our web hosting customers.
For authoritative source of the documentation, please refer to http://www.perldoc.com