Website hosting service by Active-Venture.com
  

 Back to Index

Packing and Unpacking C Structures

In previous sections we have seen how to pack numbers and character strings. If it were not for a couple of snags we could conclude this section right away with the terse remark that C structures don't contain anything else, and therefore you already know all there is to it. Sorry, no: read on, please.

The Alignment Pit

In the consideration of speed against memory requirements the balance has been tilted in favor of faster execution. This has influenced the way C compilers allocate memory for structures: On architectures where a 16-bit or 32-bit operand can be moved faster between places in memory, or to or from a CPU register, if it is aligned at an even or multiple-of-four or even at a multiple-of eight address, a C compiler will give you this speed benefit by stuffing extra bytes into structures. If you don't cross the C shoreline this is not likely to cause you any grief (although you should care when you design large data structures, or you want your code to be portable between architectures (you do want that, don't you?)).

To see how this affects pack and unpack, we'll compare these two C structures:

 
   typedef struct {
     char     c1;
     short    s;
     char     c2;
     long     l;
   } gappy_t;

   typedef struct {
     long     l;
     short    s;
     char     c1;
     char     c2;
   } dense_t;  

Typically, a C compiler allocates 12 bytes to a gappy_t variable, but requires only 8 bytes for a dense_t. After investigating this further, we can draw memory maps, showing where the extra 4 bytes are hidden:

 
   0           +4          +8          +12
   +--+--+--+--+--+--+--+--+--+--+--+--+
   |c1|xx|  s  |c2|xx|xx|xx|     l     |    xx = fill byte
   +--+--+--+--+--+--+--+--+--+--+--+--+
   gappy_t

   0           +4          +8
   +--+--+--+--+--+--+--+--+
   |     l     |  h  |c1|c2|
   +--+--+--+--+--+--+--+--+
   dense_t  

And that's where the first quirk strikes: pack and unpack templates have to be stuffed with x codes to get those extra fill bytes.

The natural question: "Why can't Perl compensate for the gaps?" warrants an answer. One good reason is that C compilers might provide (non-ANSI) extensions permitting all sorts of fancy control over the way structures are aligned, even at the level of an individual structure field. And, if this were not enough, there is an insidious thing called union where the amount of fill bytes cannot be derived from the alignment of the next item alone.

OK, so let's bite the bullet. Here's one way to get the alignment right by inserting template codes x, which don't take a corresponding item from the list:

 
  my $gappy = pack( 'cxs cxxx l!', $c1, $s, $c2, $l );  

Note the ! after l: We want to make sure that we pack a long integer as it is compiled by our C compiler. And even now, it will only work for the platforms where the compiler aligns things as above. And somebody somewhere has a platform where it doesn't. [Probably a Cray, where shorts, ints and longs are all 8 bytes. :-)]

Counting bytes and watching alignments in lengthy structures is bound to be a drag. Isn't there a way we can create the template with a simple program? Here's a C program that does the trick:

 
   #include <stdio.h>
   #include <stddef.h>

   typedef struct {
     char     fc1;
     short    fs;
     char     fc2;
     long     fl;
   } gappy_t;

   #define Pt(struct,field,tchar) \
     printf( "@%d%s ", offsetof(struct,field), # tchar );

   int main(){
     Pt( gappy_t, fc1, c  );
     Pt( gappy_t, fs,  s! );
     Pt( gappy_t, fc2, c  );
     Pt( gappy_t, fl,  l! );
     printf( "\n" );
   }  

The output line can be used as a template in a pack or unpack call:

 
  my $gappy = pack( '@0c @2s! @4c @8l!', $c1, $s, $c2, $l );  

Gee, yet another template code - as if we hadn't plenty. But @ saves our day by enabling us to specify the offset from the beginning of the pack buffer to the next item: This is just the value the offsetof macro (defined in <stddef.h>) returns when given a struct type and one of its field names ("member-designator" in C standardese).

Alignment, Take 2

I'm afraid that we're not quite through with the alignment catch yet. The hydra raises another ugly head when you pack arrays of structures:

 
   typedef struct {
     short    count;
     char     glyph;
   } cell_t;

   typedef cell_t buffer_t[BUFLEN];  

Where's the catch? Padding is neither required before the first field count, nor between this and the next field glyph, so why can't we simply pack like this:

 
   # something goes wrong here:
   pack( 's!a' x @buffer,
         map{ ( $_->{count}, $_->{glyph} ) } @buffer );  

This packs 3*@buffer bytes, but it turns out that the size of buffer_t is four times BUFLEN! The moral of the story is that the required alignment of a structure or array is propagated to the next higher level where we have to consider padding at the end of each component as well. Thus the correct template is:

 
   pack( 's!ax' x @buffer,
         map{ ( $_->{count}, $_->{glyph} ) } @buffer );  

Alignment, Take 3

And even if you take all the above into account, ANSI still lets this:

 
   typedef struct {
     char     foo[2];
   } foo_t;  

vary in size. The alignment constraint of the structure can be greater than any of its elements. [And if you think that this doesn't affect anything common, dismember the next cellphone that you see. Many have ARM cores, and the ARM structure rules make sizeof (foo_t) == 4]

Pointers for How to Use Them

The title of this section indicates the second problem you may run into sooner or later when you pack C structures. If the function you intend to call expects a, say, void * value, you cannot simply take a reference to a Perl variable. (Although that value certainly is a memory address, it's not the address where the variable's contents are stored.)

Template code P promises to pack a "pointer to a fixed length string". Isn't this what we want? Let's try:

 
    # allocate some storage and pack a pointer to it
    my $memory = "\x00" x $size;
    my $memptr = pack( 'P', $memory );  

But wait: doesn't pack just return a sequence of bytes? How can we pass this string of bytes to some C code expecting a pointer which is, after all, nothing but a number? The answer is simple: We have to obtain the numeric address from the bytes returned by pack.

 
    my $ptr = unpack( 'L!', $memptr );  

Obviously this assumes that it is possible to typecast a pointer to an unsigned long and vice versa, which frequently works but should not be taken as a universal law. - Now that we have this pointer the next question is: How can we put it to good use? We need a call to some C function where a pointer is expected. The read(2) system call comes to mind:

 
    ssize_t read(int fd, void *buf, size_t count);  

After reading perlfunc explaining how to use syscall we can write this Perl function copying a file to standard output:

 
    require 'syscall.ph';
    sub cat($){
        my $path = shift();
        my $size = -s $path;
        my $memory = "\x00" x $size;  # allocate some memory
        my $ptr = unpack( 'L', pack( 'P', $memory ) );
        open( F, $path ) || die( "$path: cannot open ($!)\n" );
        my $fd = fileno(F);
        my $res = syscall( &SYS_read, fileno(F), $ptr, $size );
        print $memory;
        close( F );
    }  

This is neither a specimen of simplicity nor a paragon of portability but it illustrates the point: We are able to sneak behind the scenes and access Perl's otherwise well-guarded memory! (Important note: Perl's syscall does not require you to construct pointers in this roundabout way. You simply pass a string variable, and Perl forwards the address.)

How does unpack with P work? Imagine some pointer in the buffer about to be unpacked: If it isn't the null pointer (which will smartly produce the undef value) we have a start address - but then what? Perl has no way of knowing how long this "fixed length string" is, so it's up to you to specify the actual size as an explicit length after P.

 
   my $mem = "abcdefghijklmn";
   print unpack( 'P5', pack( 'P', $mem ) ); # prints "abcde"  

As a consequence, pack ignores any number or * after P.

Now that we have seen P at work, we might as well give p a whirl. Why do we need a second template code for packing pointers at all? The answer lies behind the simple fact that an unpack with p promises a null-terminated string starting at the address taken from the buffer, and that implies a length for the data item to be returned:

 
   my $buf = pack( 'p', "abc\x00efhijklmn" );
   print unpack( 'p', $buf );    # prints "abc"

  

Albeit this is apt to be confusing: As a consequence of the length being implied by the string's length, a number after pack code p is a repeat count, not a length as after P.

Using pack(..., $x) with P or p to get the address where $x is actually stored must be used with circumspection. Perl's internal machinery considers the relation between a variable and that address as its very own private matter and doesn't really care that we have obtained a copy. Therefore:

  • Do not use pack with p or P to obtain the address of variable that's bound to go out of scope (and thereby freeing its memory) before you are done with using the memory at that address.
  • Be very careful with Perl operations that change the value of the variable. Appending something to the variable, for instance, might require reallocation of its storage, leaving you with a pointer into no-man's land.
  • Don't think that you can get the address of a Perl variable when it is stored as an integer or double number! pack('P', $x) will force the variable's internal representation to string, just as if you had written something like $x .= ''.

It's safe, however, to P- or p-pack a string literal, because Perl simply allocates an anonymous variable.

 

 

 

Domain name registration service & domain search - 
Register cheap domain name from $7.95 and enjoy free domain services 
 

Cheap domain name search service -
Domain name services at just
$8.95/year only
 

Register domain name -
Buy domain name registration and cheap domain transfer at low, affordable price.

© 2002-2004 Active-Venture.com Web Site Hosting Service

 

[ The most likely way for the world to be destroyed, most experts agree, is by accident. That's where we come in; we're computer professionals. We cause accidents.   ]

 

 
 
 

Disclaimer: This documentation is provided only for the benefits of our web hosting customers.
For authoritative source of the documentation, please refer to http://www.perldoc.com