perlpacktut - tutorial on pack and unpack
pack and unpack are two functions for transforming data according
to a user-defined template, between the guarded way Perl stores values and some well-defined
representation as might be required in the environment of a Perl program. Unfortunately, they're
also two of the most misunderstood and most often overlooked functions that Perl provides. This
tutorial will demystify them for you.
Most programming languages don't shelter the memory where variables are stored. In C, for
instance, you can take the address of some variable, and the sizeof operator tells
you how many bytes are allocated to the variable. Using the address and the size, you may access
the storage to your heart's content.
In Perl, you just can't access memory at random, but the structural and representational
conversion provided by pack and unpack is an excellent alternative.
The pack function converts values to a byte sequence containing representations
according to a given specification, the so-called "template" argument. unpack
is the reverse process, deriving some values from the contents of a string of bytes. (Be
cautioned, however, that not all that has been packed together can be neatly unpacked - a very
common experience as seasoned travellers are likely to confirm.)
Why, you may ask, would you need a chunk of memory containing some values in binary
representation? One good reason is input and output accessing some file, a device, or a network
connection, whereby this binary representation is either forced on you or will give you some
benefit in processing. Another cause is passing data to some system call that is not available
as a Perl function: syscall requires you to provide parameters stored in the way it
happens in a C program. Even text processing (as shown in the next section) may be simplified
with judicious usage of these two functions.
To see how (un)packing works, we'll start with a simple template code where the conversion is
in low gear: between the contents of a byte sequence and a string of hexadecimal digits. Let's
use unpack, since this is likely to remind you of a dump program, or some desperate
last message unfortunate programs are wont to throw at you before they expire into the wild blue
yonder. Assuming that the variable $mem holds a sequence of bytes that we'd like to
inspect without assuming anything about its meaning, we can write
my( $hex ) = unpack( 'H*', $mem );
print "$hex\n";
|
|
whereupon we might see something like this, with each pair of hex digits corresponding to a
byte:
41204d414e204120504c414e20412043414e414c2050414e414d41
|
|
What was in this chunk of memory? Numbers, characters, or a mixture of both? Assuming that
we're on a computer where ASCII (or some similar) encoding is used: hexadecimal values in the
range 0x40 - 0x5A indicate an uppercase letter, and 0x20
encodes a space. So we might assume it is a piece of text, which some are able to read like a
tabloid; but others will have to get hold of an ASCII table and relive that firstgrader feeling.
Not caring too much about which way to read this, we note that unpack with the
template code H converts the contents of a sequence of bytes into the customary
hexadecimal notation. Since "a sequence of" is a pretty vague indication of quantity, H
has been defined to convert just a single hexadecimal digit unless it is followed by a repeat
count. An asterisk for the repeat count means to use whatever remains.
The inverse operation - packing byte contents from a string of hexadecimal digits - is just
as easily written. For instance:
my $s = pack( 'H2' x 10, map { "3$_" } ( 0..9 ) );
print "$s\n";
|
|
Since we feed a list of ten 2-digit hexadecimal strings to pack, the pack
template should contain ten pack codes. If this is run on a computer with ASCII character
coding, it will print 0123456789.
Here are a collection of (possibly) useful canned recipes for pack and unpack:
# Convert IP address for socket functions
pack( "C4", split /\./, "123.4.5.6" );
# Count the bits in a chunk of memory (e.g. a select vector)
unpack( '%32b*', $mask );
# Determine the endianness of your system
$is_little_endian = unpack( 'c', pack( 's', 1 ) );
$is_big_endian = unpack( 'xc', pack( 's', 1 ) );
# Determine the number of bits in a native integer
$bits = unpack( '%32I!', ~0 );
# Prepare argument for the nanosleep system call
my $timespec = pack( 'L!L!', $secs, $nanosecs );
|
|
For a simple memory dump we unpack some bytes into just as many pairs of hex digits, and use map
to handle the traditional spacing - 16 bytes to a line:
my $i;
print map { ++$i % 16 ? "$_ " : "$_\n" }
unpack( 'H2' x length( $mem ), $mem ),
length( $mem ) % 16 ? "\n" : '';
|
|
# Pulling digits out of nowhere...
print unpack( 'C', pack( 'x' ) ),
unpack( '%B*', pack( 'A' ) ),
unpack( 'H', pack( 'A' ) ),
unpack( 'A', unpack( 'C', pack( 'A' ) ) ), "\n";
# One for the road ;-)
my $advice = pack( 'all u can in a van' );
|
|
Simon Cozens and Wolfgang Laun.
|
|