|
So much for textual data. Let's get onto the meaty stuff that pack and unpack
are best at: handling binary formats for numbers. There is, of course, not just one binary
format - life would be too simple - but Perl will do all the finicky labor for you.
Packing and unpacking numbers implies conversion to and from some specific binary
representation. Leaving floating point numbers aside for the moment, the salient properties of
any such representation are:
- the number of bytes used for storing the integer,
- whether the contents are interpreted as a signed or unsigned number,
- the byte ordering: whether the first byte is the least or most significant byte (or:
little-endian or big-endian, respectively).
So, for instance, to pack 20302 to a signed 16 bit integer in your computer's representation
you write
my $ps = pack( 's', 20302 );
|
|
Again, the result is a string, now containing 2 bytes. If you print this string (which is,
generally, not recommended) you might see ON or NO (depending on your
system's byte ordering) - or something entirely different if your computer doesn't use ASCII
character encoding. Unpacking $ps with the same template returns the original
integer value:
my( $s ) = unpack( 's', $ps );
|
|
This is true for all numeric template codes. But don't expect miracles: if the packed value
exceeds the allotted byte capacity, high order bits are silently discarded, and unpack certainly
won't be able to pull them back out of some magic hat. And, when you pack using a signed
template code such as s, an excess value may result in the sign bit getting set,
and unpacking this will smartly return a negative value.
16 bits won't get you too far with integers, but there is l and L
for signed and unsigned 32-bit integers. And if this is not enough and your system supports 64
bit integers you can push the limits much closer to infinity with pack codes q and Q.
A notable exception is provided by pack codes i and I for signed and
unsigned integers of the "local custom" variety: Such an integer will take up as many
bytes as a local C compiler returns for sizeof(int), but it'll use at least
32 bits.
Each of the integer pack codes sSlLqQ results in a fixed number of bytes, no
matter where you execute your program. This may be useful for some applications, but it does not
provide for a portable way to pass data structures between Perl and C programs (bound to happen
when you call XS extensions or the Perl function syscall), or when you read or
write binary files. What you'll need in this case are template codes that depend on what your
local C compiler compiles when you code short or unsigned long, for
instance. These codes and their corresponding byte lengths are shown in the table below. Since
the C standard leaves much leeway with respect to the relative sizes of these data types, actual
values may vary, and that's why the values are given as expressions in C and Perl. (If you'd
like to use values from %Config in your program you have to import it with use
Config.)
signed unsigned byte length in C byte length in Perl
s! S! sizeof(short) $Config{shortsize}
i! I! sizeof(int) $Config{intsize}
l! L! sizeof(long) $Config{longsize}
q! Q! sizeof(longlong) $Config{longlongsize}
|
|
The i! and I! codes aren't different from i and I;
they are tolerated for completeness' sake.
Requesting a particular byte ordering may be necessary when you work with binary data coming
from some specific architecture whereas your program could run on a totally different system. As
an example, assume you have 24 bytes containing a stack frame as it happens on an Intel 8086:
+---------+ +----+----+ +---------+
TOS: | IP | TOS+4:| FL | FH | FLAGS TOS+14:| SI |
+---------+ +----+----+ +---------+
| CS | | AL | AH | AX | DI |
+---------+ +----+----+ +---------+
| BL | BH | BX | BP |
+----+----+ +---------+
| CL | CH | CX | DS |
+----+----+ +---------+
| DL | DH | DX | ES |
+----+----+ +---------+
|
|
First, we note that this time-honored 16-bit CPU uses little-endian order, and that's why the
low order byte is stored at the lower address. To unpack such a (signed) short we'll have to use
code v. A repeat count unpacks all 12 shorts:
my( $ip, $cs, $flags, $ax, $bx, $cd, $dx, $si, $di, $bp, $ds, $es ) =
unpack( 'v12', $frame );
|
|
Alternatively, we could have used C to unpack the individually accessible byte
registers FL, FH, AL, AH, etc.:
my( $fl, $fh, $al, $ah, $bl, $bh, $cl, $ch, $dl, $dh ) =
unpack( 'C10', substr( $frame, 4, 10 ) );
|
|
It would be nice if we could do this in one fell swoop: unpack a short, back up a little, and
then unpack 2 bytes. Since Perl is nice, it proffers the template code X to
back up one byte. Putting this all together, we may now write:
my( $ip, $cs,
$flags,$fl,$fh,
$ax,$al,$ah, $bx,$bl,$bh, $cx,$cl,$ch, $dx,$dl,$dh,
$si, $di, $bp, $ds, $es ) =
unpack( 'v2' . ('vXXCC' x 5) . 'v5', $frame );
|
|
We've taken some pains to construct the template so that it matches the contents of our frame
buffer. Otherwise we'd either get undefined values, or unpack could not unpack all.
If pack runs out of items, it will supply null strings (which are coerced into
zeroes whenever the pack code says so).
The pack code for big-endian (high order byte at the lowest address) is n for 16
bit and N for 32 bit integers. You use these codes if you know that your data comes
from a compliant architecture, but, surprisingly enough, you should also use these pack codes if
you exchange binary data, across the network, with some system that you know next to nothing
about. The simple reason is that this order has been chosen as the network order, and all
standard-fearing programs ought to follow this convention. (This is, of course, a stern backing
for one of the Lilliputian parties and may well influence the political development there.) So,
if the protocol expects you to send a message by sending the length first, followed by just so
many bytes, you could write:
my $buf = pack( 'N', length( $msg ) ) . $msg;
|
|
or even:
my $buf = pack( 'NA*', length( $msg ), $msg );
|
|
and pass $buf to your send routine. Some protocols demand that the count should
include the length of the count itself: then just add 4 to the data length. (But make sure to
read "Lengths and Widths" before you really code
this!)
For packing floating point numbers you have the choice between the pack codes f
and d which pack into (or unpack from) single-precision or double-precision
representation as it is provided by your system. (There is no such thing as a network
representation for reals, so if you want to send your real numbers across computer boundaries,
you'd better stick to ASCII representation, unless you're absolutely sure what's on the other
end of the line.)
|
|