Website hosting service by Active-Venture.com
  

 Back to Index

Pragmas and debugging

Speaking of debugging, there are several pragmas available to control and debug regexps in Perl. We have already encountered one pragma in the previous section, use re 'eval'; , that allows variable interpolation and code expressions to coexist in a regexp. The other pragmas are

 
    use re 'taint';
    $tainted = <>;
    @parts = ($tainted =~ /(\w+)\s+(\w+)/; # @parts is now tainted  

The taint pragma causes any substrings from a match with a tainted variable to be tainted as well. This is not normally the case, as regexps are often used to extract the safe bits from a tainted variable. Use taint when you are not extracting safe bits, but are performing some other processing. Both taint and eval pragmas are lexically scoped, which means they are in effect only until the end of the block enclosing the pragmas.

 
    use re 'debug';
    /^(.*)$/s;       # output debugging info

    use re 'debugcolor';
    /^(.*)$/s;       # output debugging info in living color  

The global debug and debugcolor pragmas allow one to get detailed debugging info about regexp compilation and execution. debugcolor is the same as debug, except the debugging information is displayed in color on terminals that can display termcap color sequences. Here is example output:

 
    % perl -e 'use re "debug"; "abc" =~ /a*b+c/;'
    Compiling REx `a*b+c'
    size 9 first at 1
       1: STAR(4)
       2:   EXACT <a>(0)
       4: PLUS(7)
       5:   EXACT <b>(0)
       7: EXACT <c>(9)
       9: END(0)
    floating `bc' at 0..2147483647 (checking floating) minlen 2
    Guessing start of match, REx `a*b+c' against `abc'...
    Found floating substr `bc' at offset 1...
    Guessed: match at offset 0
    Matching REx `a*b+c' against `abc'
      Setting an EVAL scope, savestack=3
       0 <> <abc>             |  1:  STAR
                               EXACT <a> can match 1 times out of 32767...
      Setting an EVAL scope, savestack=3
       1 <a> <bc>             |  4:    PLUS
                               EXACT <b> can match 1 times out of 32767...
      Setting an EVAL scope, savestack=3
       2 <ab> <c>             |  7:      EXACT <c>
       3 <abc> <>             |  9:      END
    Match successful!
    Freeing REx: `a*b+c'  

If you have gotten this far into the tutorial, you can probably guess what the different parts of the debugging output tell you. The first part

 
    Compiling REx `a*b+c'
    size 9 first at 1
       1: STAR(4)
       2:   EXACT <a>(0)
       4: PLUS(7)
       5:   EXACT <b>(0)
       7: EXACT <c>(9)
       9: END(0)  

describes the compilation stage. STAR(4) means that there is a starred object, in this case 'a', and if it matches, goto line 4, i.e., PLUS(7). The middle lines describe some heuristics and optimizations performed before a match:

 
    floating `bc' at 0..2147483647 (checking floating) minlen 2
    Guessing start of match, REx `a*b+c' against `abc'...
    Found floating substr `bc' at offset 1...
    Guessed: match at offset 0  

Then the match is executed and the remaining lines describe the process:

 
    Matching REx `a*b+c' against `abc'
      Setting an EVAL scope, savestack=3
       0 <> <abc>             |  1:  STAR
                               EXACT <a> can match 1 times out of 32767...
      Setting an EVAL scope, savestack=3
       1 <a> <bc>             |  4:    PLUS
                               EXACT <b> can match 1 times out of 32767...
      Setting an EVAL scope, savestack=3
       2 <ab> <c>             |  7:      EXACT <c>
       3 <abc> <>             |  9:      END
    Match successful!
    Freeing REx: `a*b+c'  

Each step is of the form n <x> <y> , with <x> the part of the string matched and <y> the part not yet matched. The |  1:  STAR  says that perl is at line number 1 n the compilation list above. See perldebguts/"Debugging regular expressions" for much more detail.

An alternative method of debugging regexps is to embed print statements within the regexp. This provides a blow-by-blow account of the backtracking in an alternation:

 
    "that this" =~ m@(?{print "Start at position ", pos, "\n";})
                     t(?{print "t1\n";})
                     h(?{print "h1\n";})
                     i(?{print "i1\n";})
                     s(?{print "s1\n";})
                         |
                     t(?{print "t2\n";})
                     h(?{print "h2\n";})
                     a(?{print "a2\n";})
                     t(?{print "t2\n";})
                     (?{print "Done at position ", pos, "\n";})
                    @x;  

prints

 
    Start at position 0
    t1
    h1
    t2
    h2
    a2
    t2
    Done at position 4  

 

 

 

Domain name registration service & domain search - 
Register cheap domain name from $7.95 and enjoy free domain services 
 

Cheap domain name search service -
Domain name services at just
$8.95/year only
 

Register domain name -
Buy domain name registration and cheap domain transfer at low, affordable price.

© 2002-2004 Active-Venture.com Web Site Hosting Service

 

[ The ultimate metric that I would like to propose for user friendliness is quite simple: if this system was a person, how long would it take before you punched it in the nose.   ]

 

 
 
 

Disclaimer: This documentation is provided only for the benefits of our web hosting customers.
For authoritative source of the documentation, please refer to http://www.perldoc.com