|
Speaking of debugging, there are several pragmas available to control and debug regexps in
Perl. We have already encountered one pragma in the previous section, use re 'eval'; ,
that allows variable interpolation and code expressions to coexist in a regexp. The other
pragmas are
use re 'taint';
$tainted = <>;
@parts = ($tainted =~ /(\w+)\s+(\w+)/; # @parts is now tainted
|
|
The taint pragma causes any substrings from a match with a tainted variable to
be tainted as well. This is not normally the case, as regexps are often used to extract the safe
bits from a tainted variable. Use taint when you are not extracting safe bits, but
are performing some other processing. Both taint and eval pragmas are
lexically scoped, which means they are in effect only until the end of the block enclosing the
pragmas.
use re 'debug';
/^(.*)$/s; # output debugging info
use re 'debugcolor';
/^(.*)$/s; # output debugging info in living color
|
|
The global debug and debugcolor pragmas allow one to get detailed
debugging info about regexp compilation and execution. debugcolor is the same as
debug, except the debugging information is displayed in color on terminals that can display
termcap color sequences. Here is example output:
% perl -e 'use re "debug"; "abc" =~ /a*b+c/;'
Compiling REx `a*b+c'
size 9 first at 1
1: STAR(4)
2: EXACT <a>(0)
4: PLUS(7)
5: EXACT <b>(0)
7: EXACT <c>(9)
9: END(0)
floating `bc' at 0..2147483647 (checking floating) minlen 2
Guessing start of match, REx `a*b+c' against `abc'...
Found floating substr `bc' at offset 1...
Guessed: match at offset 0
Matching REx `a*b+c' against `abc'
Setting an EVAL scope, savestack=3
0 <> <abc> | 1: STAR
EXACT <a> can match 1 times out of 32767...
Setting an EVAL scope, savestack=3
1 <a> <bc> | 4: PLUS
EXACT <b> can match 1 times out of 32767...
Setting an EVAL scope, savestack=3
2 <ab> <c> | 7: EXACT <c>
3 <abc> <> | 9: END
Match successful!
Freeing REx: `a*b+c'
|
|
If you have gotten this far into the tutorial, you can probably guess what the different
parts of the debugging output tell you. The first part
Compiling REx `a*b+c'
size 9 first at 1
1: STAR(4)
2: EXACT <a>(0)
4: PLUS(7)
5: EXACT <b>(0)
7: EXACT <c>(9)
9: END(0)
|
|
describes the compilation stage. STAR(4) means that there is a starred object,
in this case 'a', and if it matches, goto line 4, i.e., PLUS(7). The
middle lines describe some heuristics and optimizations performed before a match:
floating `bc' at 0..2147483647 (checking floating) minlen 2
Guessing start of match, REx `a*b+c' against `abc'...
Found floating substr `bc' at offset 1...
Guessed: match at offset 0
|
|
Then the match is executed and the remaining lines describe the process:
Matching REx `a*b+c' against `abc'
Setting an EVAL scope, savestack=3
0 <> <abc> | 1: STAR
EXACT <a> can match 1 times out of 32767...
Setting an EVAL scope, savestack=3
1 <a> <bc> | 4: PLUS
EXACT <b> can match 1 times out of 32767...
Setting an EVAL scope, savestack=3
2 <ab> <c> | 7: EXACT <c>
3 <abc> <> | 9: END
Match successful!
Freeing REx: `a*b+c'
|
|
Each step is of the form n <x> <y> , with <x>
the part of the string matched and <y> the part not yet matched. The | 1: STAR
says that perl is at line number 1 n the compilation list above. See perldebguts/"Debugging
regular expressions" for much more detail.
An alternative method of debugging regexps is to embed print statements within
the regexp. This provides a blow-by-blow account of the backtracking in an alternation:
"that this" =~ m@(?{print "Start at position ", pos, "\n";})
t(?{print "t1\n";})
h(?{print "h1\n";})
i(?{print "i1\n";})
s(?{print "s1\n";})
|
t(?{print "t2\n";})
h(?{print "h2\n";})
a(?{print "a2\n";})
t(?{print "t2\n";})
(?{print "Done at position ", pos, "\n";})
@x;
|
|
prints
Start at position 0
t1
h1
t2
h2
a2
t2
Done at position 4
|
|
|
|