|
perlfaq7 - General Perl Language Issues
This section deals with general Perl language issues that don't clearly fit into any of the
other sections.
There is no BNF, but you can paw your way through the yacc grammar in perly.y in the source
distribution if you're particularly brave. The grammar relies on very smart tokenizing code, so
be prepared to venture into toke.c as well.
In the words of Chaim Frenkel: "Perl's grammar can not be reduced to BNF. The work of
parsing perl is distributed between yacc, the lexer, smoke and mirrors."
They are type specifiers, as detailed in perldata:
$ for scalar values (number, string or reference)
@ for arrays
% for hashes (associative arrays)
& for subroutines (aka functions, procedures, methods)
* for all types of that symbol name. In version 4 you used them like
pointers, but in modern perls you can just use references.
|
|
There are couple of other symbols that you're likely to encounter that aren't really type
specifiers:
<> are used for inputting a record from a filehandle.
\ takes a reference to something.
|
|
Note that <FILE> is neither the type specifier for files nor the name of the
handle. It is the <> operator applied to the handle FILE. It reads one line
(well, record--see perlvar/$/)
from the handle FILE in scalar context, or all lines in list context. When performing
open, close, or any other operation besides <> on files, or even when talking
about the handle, do not use the brackets. These are correct: eof(FH), seek(FH,
0, 2) and "copying from STDIN to FILE".
Normally, a bareword doesn't need to be quoted, but in most cases probably should be (and
must be under use strict). But a hash key consisting of a simple word (that isn't
the name of a defined subroutine) and the left-hand operand to the => operator
both count as though they were quoted:
This is like this
------------ ---------------
$foo{line} $foo{"line"}
bar => stuff "bar" => stuff
|
|
The final semicolon in a block is optional, as is the final comma in a list. Good style (see perlstyle) says to put them in
except for one-liners:
if ($whoops) { exit 1 }
@nums = (1, 2, 3);
if ($whoops) {
exit 1;
}
@lines = (
"There Beren came from mountains cold",
"And lost he wandered under leaves",
);
|
|
One way is to treat the return values as a list and index into it:
$dir = (getpwnam($user))[7];
|
|
Another way is to use undef as an element on the left-hand-side:
($dev, $ino, undef, undef, $uid, $gid) = stat($file);
|
|
If you are running Perl 5.6.0 or better, the use warnings pragma allows fine
control of what warning are produced. See perllexwarn for more details.
{
no warnings; # temporarily turn off warnings
$a = $b + $c; # I know these might be undef
}
|
|
If you have an older version of Perl, the $^W variable (documented in perlvar) controls runtime warnings
for a block:
{
local $^W = 0; # temporarily turn off warnings
$a = $b + $c; # I know these might be undef
}
|
|
Note that like all the punctuation variables, you cannot currently use my() on $^W,
only local().
An extension is a way of calling compiled C code from Perl. Reading perlxstut is a good place to
learn more about extensions.
Actually, they don't. All C operators that Perl copies have the same precedence in Perl as
they do in C. The problem is with operators that C doesn't have, especially functions that give
a list context to everything on their right, eg. print, chmod, exec, and so on. Such functions
are called "list operators" and appear as such in the precedence table in perlop.
A common mistake is to write:
unlink $file || die "snafu";
|
|
This gets interpreted as:
unlink ($file || die "snafu");
|
|
To avoid this problem, either put in extra parentheses or use the super low precedence or
operator:
(unlink $file) || die "snafu";
unlink $file or die "snafu";
|
|
The "English" operators (and, or, xor, and not)
deliberately have precedence lower than that of list operators for just such situations as the
one above.
Another operator with surprising precedence is exponentiation. It binds more tightly even
than unary minus, making -2**2 product a negative not a positive four. It is also
right-associating, meaning that 2**3**2 is two raised to the ninth power, not eight
squared.
Although it has the same precedence as in C, Perl's ?: operator produces an
lvalue. This assigns $x to either $a or $b, depending on the trueness of $maybe:
In general, you don't "declare" a structure. Just use a (probably anonymous) hash
reference. See perlref and perldsc for details. Here's an
example:
$person = {}; # new anonymous hash
$person->{AGE} = 24; # set field AGE to 24
$person->{NAME} = "Nat"; # set field NAME to "Nat"
|
|
If you're looking for something a bit more rigorous, try perltoot.
A module is a package that lives in a file of the same name. For example, the Hello::There
module would live in Hello/There.pm. For details, read perlmod. You'll also find Exporter helpful. If you're
writing a C or mixed-language module with both C and Perl, then you should study perlxstut.
The h2xs program will create stubs for all the important stuff for you:
The -X switch tells h2xs that you are not using XS
extension code. The -A switch tells h2xs that you are not using the
AutoLoader, and the -n switch specifies the name of the module. See h2xs for more details.
See perltoot for an
introduction to classes and objects, as well as perlobj and perlbot.
You can use the tainted() function of the Scalar::Util module, available from CPAN (or
included with Perl since release 5.8.0). See also perlsec/"Laundering
and Detecting Tainted Data".
Closures are documented in perlref.
Closure is a computer science term with a precise but hard-to-explain meaning.
Closures are implemented in Perl as anonymous subroutines with lasting references to lexical
variables outside their own scopes. These lexicals magically refer to the variables that were
around when the subroutine was defined (deep binding).
Closures make sense in any programming language where you can have the return value of a
function be itself a function, as you can in Perl. Note that some languages provide anonymous
functions but are not capable of providing proper closures: the Python language, for example.
For more information on closures, check out any textbook on functional programming. Scheme is a
language that not only supports but encourages closures.
Here's a classic function-generating function:
sub add_function_generator {
return sub { shift + shift };
}
$add_sub = add_function_generator();
$sum = $add_sub->(4,5); # $sum is 9 now.
|
|
The closure works as a function template with some customization slots left out to be
filled later. The anonymous subroutine returned by add_function_generator() isn't technically a
closure because it refers to no lexicals outside its own scope.
Contrast this with the following make_adder() function, in which the returned anonymous
function contains a reference to a lexical variable outside the scope of that function itself.
Such a reference requires that Perl return a proper closure, thus locking in for all time the
value that the lexical had when the function was created.
sub make_adder {
my $addpiece = shift;
return sub { shift + $addpiece };
}
$f1 = make_adder(20);
$f2 = make_adder(555);
|
|
Now &$f1($n) is always 20 plus whatever $n you pass in, whereas &$f2($n)
is always 555 plus whatever $n you pass in. The $addpiece in the closure sticks around.
Closures are often used for less esoteric purposes. For example, when you want to pass in a
bit of code into a function:
my $line;
timeout( 30, sub { $line = <STDIN> } );
|
|
If the code to execute had been passed in as a string, '$line = <STDIN>',
there would have been no way for the hypothetical timeout() function to access the lexical
variable $line back in its caller's scope.
Variable suicide is when you (temporarily or permanently) lose the value of a variable. It is
caused by scoping through my() and local() interacting with either closures or aliased foreach()
iterator variables and subroutine arguments. It used to be easy to inadvertently lose a
variable's value this way, but now it's much harder. Take this code:
my $f = "foo";
sub T {
while ($i++ < 3) { my $f = $f; $f .= "bar"; print $f, "\n" }
}
T;
print "Finally $f\n";
|
|
The $f that has "bar" added to it three times should be a new $f (my
$f should create a new local variable each time through the loop). It isn't, however.
This was a bug, now fixed in the latest releases (tested against 5.004_05, 5.005_03, and
5.005_56).
With the exception of regexes, you need to pass references to these objects. See perlsub/"Pass by
Reference" for this particular question, and perlref for information on
references.
See ``Passing Regexes'', below, for information on passing regular expressions.
- Passing Variables and Functions
-
Regular variables and functions are quite easy to pass: just pass in a reference to an
existing or anonymous variable or function:
func( \$some_scalar );
func( \@some_array );
func( [ 1 .. 10 ] );
func( \%some_hash );
func( { this => 10, that => 20 } );
func( \&some_func );
func( sub { $_[0] ** $_[1] } );
|
|
- Passing Filehandles
-
To pass filehandles to subroutines, use the *FH or \*FH
notations. These are "typeglobs"--see perldata/"Typeglobs
and Filehandles" and especially perlsub/"Pass
by Reference" for more information.
Here's an excerpt:
If you're passing around filehandles, you could usually just use the bare typeglob, like
*STDOUT, but typeglobs references would be better because they'll still work properly under use
strict 'refs'. For example:
splutter(\*STDOUT);
sub splutter {
my $fh = shift;
print $fh "her um well a hmmm\n";
}
$rec = get_rec(\*STDIN);
sub get_rec {
my $fh = shift;
return scalar <$fh>;
}
|
|
If you're planning on generating new filehandles, you could do this:
sub openit {
my $path = shift;
local *FH;
return open (FH, $path) ? *FH : undef;
}
$fh = openit('< /etc/motd');
print <$fh>;
|
|
- Passing Regexes
-
To pass regexes around, you'll need to be using a release of Perl sufficiently recent as
to support the qr// construct, pass around strings and use an
exception-trapping eval, or else be very, very clever.
Here's an example of how to pass in a string to be regex compared using qr//:
sub compare($$) {
my ($val1, $regex) = @_;
my $retval = $val1 =~ /$regex/;
return $retval;
}
$match = compare("old McDonald", qr/d.*D/i);
|
|
Notice how qr// allows flags at the end. That pattern was compiled at
compile time, although it was executed later. The nifty qr// notation wasn't
introduced until the 5.005 release. Before that, you had to approach this problem much less
intuitively. For example, here it is again if you don't have qr//:
sub compare($$) {
my ($val1, $regex) = @_;
my $retval = eval { $val1 =~ /$regex/ };
die if $@;
return $retval;
}
$match = compare("old McDonald", q/($?i)d.*D/);
|
|
Make sure you never say something like this:
return eval "\$val =~ /$regex/"; # WRONG
|
|
or someone can sneak shell escapes into the regex due to the double interpolation of the
eval and the double-quoted string. For example:
$pattern_of_evil = 'danger ${ system("rm -rf * &") } danger';
eval "\$string =~ /$pattern_of_evil/";
|
|
Those preferring to be very, very clever might see the O'Reilly book, Mastering
Regular Expressions, by Jeffrey Friedl. Page 273's Build_MatchMany_Function() is
particularly interesting. A complete citation of this book is given in perlfaq2.
- Passing Methods
-
To pass an object method into a subroutine, you can do this:
call_a_lot(10, $some_obj, "methname")
sub call_a_lot {
my ($count, $widget, $trick) = @_;
for (my $i = 0; $i < $count; $i++) {
$widget->$trick();
}
}
|
|
Or, you can use a closure to bundle up the object, its method call, and arguments:
my $whatnot = sub { $some_obj->obfuscate(@args) };
func($whatnot);
sub func {
my $code = shift;
&$code();
}
|
|
You could also investigate the can() method in the UNIVERSAL class (part of the standard
perl distribution).
As with most things in Perl, TMTOWTDI. What is a "static variable" in other
languages could be either a function-private variable (visible only within a single function,
retaining its value between calls to that function), or a file-private variable (visible only to
functions within the file it was declared in) in Perl.
Here's code to implement a function-private variable:
BEGIN {
my $counter = 42;
sub prev_counter { return --$counter }
sub next_counter { return $counter++ }
}
|
|
Now prev_counter() and next_counter() share a private variable $counter that was initialized
at compile time.
To declare a file-private variable, you'll still use a my(), putting the declaration at the
outer scope level at the top of the file. Assume this is in file Pax.pm:
package Pax;
my $started = scalar(localtime(time()));
sub begun { return $started }
|
|
When use Pax or require Pax loads this module, the variable will be
initialized. It won't get garbage-collected the way most variables going out of scope do,
because the begun() function cares about it, but no one else can get it. It is not called $Pax::started
because its scope is unrelated to the package. It's scoped to the file. You could conceivably
have several packages in that same file all accessing the same private variable, but another
file with the same package couldn't get to it.
See perlsub/"Persistent
Private Variables" for details.
local($x) saves away the old value of the global variable $x and
assigns a new value for the duration of the subroutine which is visible in other functions
called from that subroutine. This is done at run-time, so is called dynamic scoping. local()
always affects global variables, also called package variables or dynamic variables.
my($x) creates a new variable that is only visible in the current subroutine.
This is done at compile-time, so it is called lexical or static scoping. my() always affects
private variables, also called lexical variables or (improperly) static(ly scoped) variables.
For instance:
sub visible {
print "var has value $var\n";
}
sub dynamic {
local $var = 'local'; # new temporary value for the still-global
visible(); # variable called $var
}
sub lexical {
my $var = 'private'; # new private variable, $var
visible(); # (invisible outside of sub scope)
}
$var = 'global';
visible(); # prints global
dynamic(); # prints local
lexical(); # prints global
|
|
Notice how at no point does the value "private" get printed. That's because $var
only has that value within the block of the lexical() function, and it is hidden from called
subroutine.
In summary, local() doesn't make what you think of as private, local variables. It gives a
global variable a temporary value. my() is what you're looking for if you want private
variables.
See perlsub/"Private
Variables via my()" and perlsub/"Temporary
Values via local()" for excruciating details.
You can do this via symbolic references, provided you haven't set use strict
"refs". So instead of $var, use ${'var'}.
local $var = "global";
my $var = "lexical";
print "lexical is $var\n";
no strict 'refs';
print "global is ${'var'}\n";
|
|
If you know your package, you can just mention it explicitly, as in $Some_Pack::var. Note
that the notation $::var is not the dynamic $var in the current package, but rather the
one in the main package, as though you had written $main::var. Specifying the
package directly makes you hard-code its name, but it executes faster and avoids running afoul
of use strict "refs".
In deep binding, lexical variables mentioned in anonymous subroutines are the same ones that
were in scope when the subroutine was created. In shallow binding, they are whichever variables
with the same names happen to be in scope when the subroutine is called. Perl always uses deep
binding of lexical variables (i.e., those created with my()). However, dynamic variables (aka
global, local, or package variables) are effectively shallowly bound. Consider this just one
more reason not to use them. See the answer to "What's a
closure?".
my() and local() give list context to the right hand side of =.
The <FH> read operation, like so many of Perl's functions and operators, can tell which
context it was called in and behaves appropriately. In general, the scalar() function can help.
This function does nothing to the data itself (contrary to popular myth) but rather tells its
argument to behave in whatever its scalar fashion is. If that function doesn't have a defined
scalar behavior, this of course doesn't help you (such as with sort()).
To enforce scalar context in this particular case, however, you need merely omit the
parentheses:
local($foo) = <FILE>; # WRONG
local($foo) = scalar(<FILE>); # ok
local $foo = <FILE>; # right
|
|
You should probably be using lexical variables anyway, although the issue is the same here:
my($foo) = <FILE>; # WRONG
my $foo = <FILE>; # right
|
|
Why do you want to do that? :-)
If you want to override a predefined function, such as open(), then you'll have to import the
new definition from a different module. See perlsub/"Overriding
Built-in Functions". There's also an example in perltoot/"Class::Template".
If you want to overload a Perl operator, such as + or **, then
you'll want to use the use overload pragma, documented in overload.
If you're talking about obscuring method calls in parent classes, see perltoot/"Overridden
Methods".
When you call a function as &foo, you allow that function access to your
current @_ values, and you bypass prototypes. The function doesn't get an empty @_--it gets
yours! While not strictly speaking a bug (it's documented that way in perlsub), it would be hard to
consider this a feature in most cases.
When you call your function as &foo(), then you do get a new @_, but
prototyping is still circumvented.
Normally, you want to call a function using foo(). You may only omit the
parentheses if the function is already known to the compiler because it already saw the
definition (use but not require), or via a forward reference or use
subs declaration. Even in this case, you get a clean @_ without any of the old values
leaking through where they don't belong.
This is explained in more depth in the perlsyn. Briefly, there's no
official case statement, because of the variety of tests possible in Perl (numeric comparison,
string comparison, glob comparison, regex matching, overloaded comparisons, ...). Larry couldn't
decide how best to do this, so he left it out, even though it's been on the wish list since
perl1.
Starting from Perl 5.8 to get switch and case one can use the Switch extension and say:
after which one has switch and case. It is not as fast as it could be because it's not really
part of the language (it's done using source filters) but it is available, and it's very
flexible.
But if one wants to use pure Perl, the general answer is to write a construct like this:
for ($variable_to_test) {
if (/pat1/) { } # do something
elsif (/pat2/) { } # do something else
elsif (/pat3/) { } # do something else
else { } # default
}
|
|
Here's a simple example of a switch based on pattern matching, this time lined up in a way to
make it look more like a switch statement. We'll do a multiway conditional based on the type of
reference stored in $whatchamacallit:
SWITCH: for (ref $whatchamacallit) {
/^$/ && die "not a reference";
/SCALAR/ && do {
print_scalar($$ref);
last SWITCH;
};
/ARRAY/ && do {
print_array(@$ref);
last SWITCH;
};
/HASH/ && do {
print_hash(%$ref);
last SWITCH;
};
/CODE/ && do {
warn "can't print function ref";
last SWITCH;
};
# DEFAULT
warn "User defined type skipped";
}
|
|
See perlsyn/"Basic BLOCKs and Switch Statements" for many other
examples in this style.
Sometimes you should change the positions of the constant and the variable. For example,
let's say you wanted to test which of many answers you were given, but in a case-insensitive way
that also allows abbreviations. You can use the following technique if the strings all start
with different characters or if you want to arrange the matches so that one takes precedence
over another, as "SEND" has precedence over "STOP"
here:
chomp($answer = <>);
if ("SEND" =~ /^\Q$answer/i) { print "Action is send\n" }
elsif ("STOP" =~ /^\Q$answer/i) { print "Action is stop\n" }
elsif ("ABORT" =~ /^\Q$answer/i) { print "Action is abort\n" }
elsif ("LIST" =~ /^\Q$answer/i) { print "Action is list\n" }
elsif ("EDIT" =~ /^\Q$answer/i) { print "Action is edit\n" }
|
|
A totally different approach is to create a hash of function references.
my %commands = (
"happy" => \&joy,
"sad", => \&sullen,
"done" => sub { die "See ya!" },
"mad" => \&angry,
);
print "How are you? ";
chomp($string = <STDIN>);
if ($commands{$string}) {
$commands{$string}->();
} else {
print "No such command: $string\n";
}
|
|
The AUTOLOAD method, discussed in perlsub/"Autoloading"
and perltoot/"AUTOLOAD:
Proxy Methods", lets you capture calls to undefined functions and methods.
When it comes to undefined variables that would trigger a warning under -w, you
can use a handler to trap the pseudo-signal __WARN__ like this:
$SIG{__WARN__} = sub {
for ( $_[0] ) { # voici un switch statement
/Use of uninitialized value/ && do {
# promote warning to a fatal
die $_;
};
# other warning cases to catch could go here;
warn $_;
}
};
|
|
Some possible reasons: your inheritance is getting confused, you've misspelled the method
name, or the object is of the wrong type. Check out perltoot for details about any of
the above cases. You may also use print ref($object) to find out the class $object
was blessed into.
Another possible reason for problems is because you've used the indirect object syntax (eg, find
Guru "Samy") on a class name before Perl has seen that such a package exists.
It's wisest to make sure your packages are all defined before you start using them, which will
be taken care of if you use the use statement instead of require. If
not, make sure to use arrow notation (eg., Guru->find("Samy"))
instead. Object notation is explained in perlobj.
Make sure to read about creating modules in perlmod and the perils of indirect
objects in perlobj/"Method
Invocation".
If you're just a random program, you can do this to find out what the currently compiled
package is:
my $packname = __PACKAGE__;
|
|
But, if you're a method and you want to print an error message that includes the kind of
object you were called on (which is not necessarily the same as the one in which you were
compiled):
sub amethod {
my $self = shift;
my $class = ref($self) || $self;
warn "called me from a $class object";
}
|
|
You can use embedded POD to discard it. The =for directive lasts until the next paragraph
(two consecutive newlines).
# program is here
=for nobody
This paragraph is commented out
# program continues
|
|
The =begin and =end directives can contain multiple paragraphs.
=begin comment text
all of this stuff
here will be ignored
by everyone
=end comment text
|
|
The pod directives cannot go just anywhere. You must put a pod directive where the parser is
expecting a new statement, not just in the middle of an expression or some other arbitrary s
grammar production.
See perlpod for more details.
Use this code, provided by Mark-Jason Dominus:
sub scrub_package {
no strict 'refs';
my $pack = shift;
die "Shouldn't delete main package"
if $pack eq "" || $pack eq "main";
my $stash = *{$pack . '::'}{HASH};
my $name;
foreach $name (keys %$stash) {
my $fullname = $pack . '::' . $name;
# Get rid of everything with that name.
undef $$fullname;
undef @$fullname;
undef %$fullname;
undef &$fullname;
undef *$fullname;
}
}
|
|
Or, if you're using a recent release of Perl, you can just use the Symbol::delete_package()
function instead.
Beginners often think they want to have a variable contain the name of a variable.
$fred = 23;
$varname = "fred";
++$$varname; # $fred now 24
|
|
This works sometimes, but it is a very bad idea for two reasons.
The first reason is that this technique only works on global variables. That means
that if $fred is a lexical variable created with my() in the above example, the code wouldn't
work at all: you'd accidentally access the global and skip right over the private lexical
altogether. Global variables are bad because they can easily collide accidentally and in general
make for non-scalable and confusing code.
Symbolic references are forbidden under the use strict pragma. They are not true
references and consequently are not reference counted or garbage collected.
The other reason why using a variable to hold the name of another variable is a bad idea is
that the question often stems from a lack of understanding of Perl data structures, particularly
hashes. By using symbolic references, you are just using the package's symbol-table hash (like %main::)
instead of a user-defined hash. The solution is to use your own hash or a real reference
instead.
$fred = 23;
$varname = "fred";
$USER_VARS{$varname}++; # not $$varname++
|
|
There we're using the %USER_VARS hash instead of symbolic references. Sometimes this comes up
in reading strings from the user with variable references and wanting to expand them to the
values of your perl program's variables. This is also a bad idea because it conflates the
program-addressable namespace and the user-addressable one. Instead of reading a string and
expanding it to the actual contents of your program's own variables:
$str = 'this has a $fred and $barney in it';
$str =~ s/(\$\w+)/$1/eeg; # need double eval
|
|
it would be better to keep a hash around like %USER_VARS and have variable references
actually refer to entries in that hash:
$str =~ s/\$(\w+)/$USER_VARS{$1}/g; # no /e here at all
|
|
That's faster, cleaner, and safer than the previous approach. Of course, you don't need to
use a dollar sign. You could use your own scheme to make it less confusing, like bracketed
percent symbols, etc.
$str = 'this has a %fred% and %barney% in it';
$str =~ s/%(\w+)%/$USER_VARS{$1}/g; # no /e here at all
|
|
Another reason that folks sometimes think they want a variable to contain the name of a
variable is because they don't know how to build proper data structures using hashes. For
example, let's say they wanted two hashes in their program: %fred and %barney, and that they
wanted to use another scalar variable to refer to those by name.
$name = "fred";
$$name{WIFE} = "wilma"; # set %fred
$name = "barney";
$$name{WIFE} = "betty"; # set %barney
|
|
This is still a symbolic reference, and is still saddled with the problems enumerated above.
It would be far better to write:
$folks{"fred"}{WIFE} = "wilma";
$folks{"barney"}{WIFE} = "betty";
|
|
And just use a multilevel hash to start with.
The only times that you absolutely must use symbolic references are when you really
must refer to the symbol table. This may be because it's something that can't take a real
reference to, such as a format name. Doing so may also be important for method calls, since
these always go through the symbol table for resolution.
In those cases, you would turn off strict 'refs' temporarily so you can play
around with the symbol table. For example:
@colors = qw(red blue green yellow orange purple violet);
for my $name (@colors) {
no strict 'refs'; # renege for the block
*$name = sub { "<FONT COLOR='$name'>@_</FONT>" };
}
|
|
All those functions (red(), blue(), green(), etc.) appear to be separate, but the real code
in the closure actually was compiled only once.
So, sometimes you might want to use symbolic references to directly manipulate the symbol
table. This doesn't matter for formats, handles, and subroutines, because they are always
global--you can't use my() on them. For scalars, arrays, and hashes, though--and usually for
subroutines-- you probably only want to use hard references.
Copyright (c) 1997-2002 Tom Christiansen and Nathan Torkington. All rights reserved.
This documentation is free; you can redistribute it and/or modify it under the same terms as
Perl itself.
Irrespective of its distribution, all code examples in this file are hereby placed into the
public domain. You are permitted and encouraged to use this code in your own programs for fun or
for profit as you see fit. A simple comment in the code giving credit would be courteous but is
not required.
|