Website hosting service by Active-Venture.com
  

 Back to Index

Packing Numbers

So much for textual data. Let's get onto the meaty stuff that pack and unpack are best at: handling binary formats for numbers. There is, of course, not just one binary format - life would be too simple - but Perl will do all the finicky labor for you.

Integers

Packing and unpacking numbers implies conversion to and from some specific binary representation. Leaving floating point numbers aside for the moment, the salient properties of any such representation are:

  • the number of bytes used for storing the integer,
  • whether the contents are interpreted as a signed or unsigned number,
  • the byte ordering: whether the first byte is the least or most significant byte (or: little-endian or big-endian, respectively).

So, for instance, to pack 20302 to a signed 16 bit integer in your computer's representation you write

 
   my $ps = pack( 's', 20302 );  

Again, the result is a string, now containing 2 bytes. If you print this string (which is, generally, not recommended) you might see ON or NO (depending on your system's byte ordering) - or something entirely different if your computer doesn't use ASCII character encoding. Unpacking $ps with the same template returns the original integer value:

 
   my( $s ) = unpack( 's', $ps );  

This is true for all numeric template codes. But don't expect miracles: if the packed value exceeds the allotted byte capacity, high order bits are silently discarded, and unpack certainly won't be able to pull them back out of some magic hat. And, when you pack using a signed template code such as s, an excess value may result in the sign bit getting set, and unpacking this will smartly return a negative value.

16 bits won't get you too far with integers, but there is l and L for signed and unsigned 32-bit integers. And if this is not enough and your system supports 64 bit integers you can push the limits much closer to infinity with pack codes q and Q. A notable exception is provided by pack codes i and I for signed and unsigned integers of the "local custom" variety: Such an integer will take up as many bytes as a local C compiler returns for sizeof(int), but it'll use at least 32 bits.

Each of the integer pack codes sSlLqQ results in a fixed number of bytes, no matter where you execute your program. This may be useful for some applications, but it does not provide for a portable way to pass data structures between Perl and C programs (bound to happen when you call XS extensions or the Perl function syscall), or when you read or write binary files. What you'll need in this case are template codes that depend on what your local C compiler compiles when you code short or unsigned long, for instance. These codes and their corresponding byte lengths are shown in the table below. Since the C standard leaves much leeway with respect to the relative sizes of these data types, actual values may vary, and that's why the values are given as expressions in C and Perl. (If you'd like to use values from %Config in your program you have to import it with use Config.)

 
   signed unsigned  byte length in C   byte length in Perl       
     s!     S!      sizeof(short)      $Config{shortsize}
     i!     I!      sizeof(int)        $Config{intsize}
     l!     L!      sizeof(long)       $Config{longsize}
     q!     Q!      sizeof(longlong)   $Config{longlongsize}  

The i! and I! codes aren't different from i and I; they are tolerated for completeness' sake.

Unpacking a Stack Frame

Requesting a particular byte ordering may be necessary when you work with binary data coming from some specific architecture whereas your program could run on a totally different system. As an example, assume you have 24 bytes containing a stack frame as it happens on an Intel 8086:

 
      +---------+        +----+----+               +---------+
 TOS: |   IP    |  TOS+4:| FL | FH | FLAGS  TOS+14:|   SI    |
      +---------+        +----+----+               +---------+
      |   CS    |        | AL | AH | AX            |   DI    |
      +---------+        +----+----+               +---------+
                         | BL | BH | BX            |   BP    |
                         +----+----+               +---------+
                         | CL | CH | CX            |   DS    |
                         +----+----+               +---------+
                         | DL | DH | DX            |   ES    |
                         +----+----+               +---------+  

First, we note that this time-honored 16-bit CPU uses little-endian order, and that's why the low order byte is stored at the lower address. To unpack such a (signed) short we'll have to use code v. A repeat count unpacks all 12 shorts:

 
   my( $ip, $cs, $flags, $ax, $bx, $cd, $dx, $si, $di, $bp, $ds, $es ) =
     unpack( 'v12', $frame );  

Alternatively, we could have used C to unpack the individually accessible byte registers FL, FH, AL, AH, etc.:

 
   my( $fl, $fh, $al, $ah, $bl, $bh, $cl, $ch, $dl, $dh ) =
     unpack( 'C10', substr( $frame, 4, 10 ) );  

It would be nice if we could do this in one fell swoop: unpack a short, back up a little, and then unpack 2 bytes. Since Perl is nice, it proffers the template code X to back up one byte. Putting this all together, we may now write:

 
   my( $ip, $cs,
       $flags,$fl,$fh,
       $ax,$al,$ah, $bx,$bl,$bh, $cx,$cl,$ch, $dx,$dl,$dh, 
       $si, $di, $bp, $ds, $es ) =
   unpack( 'v2' . ('vXXCC' x 5) . 'v5', $frame );  

We've taken some pains to construct the template so that it matches the contents of our frame buffer. Otherwise we'd either get undefined values, or unpack could not unpack all. If pack runs out of items, it will supply null strings (which are coerced into zeroes whenever the pack code says so).

How to Eat an Egg on a Net

The pack code for big-endian (high order byte at the lowest address) is n for 16 bit and N for 32 bit integers. You use these codes if you know that your data comes from a compliant architecture, but, surprisingly enough, you should also use these pack codes if you exchange binary data, across the network, with some system that you know next to nothing about. The simple reason is that this order has been chosen as the network order, and all standard-fearing programs ought to follow this convention. (This is, of course, a stern backing for one of the Lilliputian parties and may well influence the political development there.) So, if the protocol expects you to send a message by sending the length first, followed by just so many bytes, you could write:

 
   my $buf = pack( 'N', length( $msg ) ) . $msg;  

or even:

 
   my $buf = pack( 'NA*', length( $msg ), $msg );  

and pass $buf to your send routine. Some protocols demand that the count should include the length of the count itself: then just add 4 to the data length. (But make sure to read "Lengths and Widths" before you really code this!)

Floating point Numbers

For packing floating point numbers you have the choice between the pack codes f and d which pack into (or unpack from) single-precision or double-precision representation as it is provided by your system. (There is no such thing as a network representation for reals, so if you want to send your real numbers across computer boundaries, you'd better stick to ASCII representation, unless you're absolutely sure what's on the other end of the line.)

 

 

 

Domain name registration service & domain search - 
Register cheap domain name from $7.95 and enjoy free domain services 
 

Cheap domain name search service -
Domain name services at just
$8.95/year only
 

Register domain name -
Buy domain name registration and cheap domain transfer at low, affordable price.

© 2002-2004 Active-Venture.com Web Site Hosting Service

 

[ All parts should go together without forcing. You must remember that the parts you are reassembling were disassembled by you. Therefore, if you can't get them together again, there must be a reason. By all means, do not use a hammer.   ]

 

 
 
 

Disclaimer: This documentation is provided only for the benefits of our web hosting customers.
For authoritative source of the documentation, please refer to http://www.perldoc.com