Website hosting service by Active-Venture.com
  

 Back to Index

Looking ahead and looking behind

This section concerns the lookahead and lookbehind assertions. First, a little background.

In Perl regular expressions, most regexp elements 'eat up' a certain amount of string when they match. For instance, the regexp element [abc}] eats up one character of the string when it matches, in the sense that perl moves to the next character position in the string after the match. There are some elements, however, that don't eat up characters (advance the character position) if they match. The examples we have seen so far are the anchors. The anchor ^ matches the beginning of the line, but doesn't eat any characters. Similarly, the word boundary anchor \b matches, e.g., if the character to the left is a word character and the character to the right is a non-word character, but it doesn't eat up any characters itself. Anchors are examples of 'zero-width assertions'. Zero-width, because they consume no characters, and assertions, because they test some property of the string. In the context of our walk in the woods analogy to regexp matching, most regexp elements move us along a trail, but anchors have us stop a moment and check our surroundings. If the local environment checks out, we can proceed forward. But if the local environment doesn't satisfy us, we must backtrack.

Checking the environment entails either looking ahead on the trail, looking behind, or both. ^ looks behind, to see that there are no characters before. $ looks ahead, to see that there are no characters after. \b looks both ahead and behind, to see if the characters on either side differ in their 'word'-ness.

The lookahead and lookbehind assertions are generalizations of the anchor concept. Lookahead and lookbehind are zero-width assertions that let us specify which characters we want to test for. The lookahead assertion is denoted by (?=regexp) and the lookbehind assertion is denoted by (?<=fixed-regexp). Some examples are

 
    $x = "I catch the housecat 'Tom-cat' with catnip";
    $x =~ /cat(?=\s+)/;  # matches 'cat' in 'housecat'
    @catwords = ($x =~ /(?<=\s)cat\w+/g);  # matches,
                                           # $catwords[0] = 'catch'
                                           # $catwords[1] = 'catnip'
    $x =~ /\bcat\b/;  # matches 'cat' in 'Tom-cat'
    $x =~ /(?<=\s)cat(?=\s)/; # doesn't match; no isolated 'cat' in
                              # middle of $x  

Note that the parentheses in (?=regexp) and (?<=regexp) are non-capturing, since these are zero-width assertions. Thus in the second regexp, the substrings captured are those of the whole regexp itself. Lookahead (?=regexp) can match arbitrary regexps, but lookbehind (?<=fixed-regexp) only works for regexps of fixed width, i.e., a fixed number of characters long. Thus (?<=(ab|bc)) is fine, but (?<=(ab)*) is not. The negated versions of the lookahead and lookbehind assertions are denoted by (?!regexp) and (?<!fixed-regexp) respectively. They evaluate true if the regexps do not match:

 
    $x = "foobar";
    $x =~ /foo(?!bar)/;  # doesn't match, 'bar' follows 'foo'
    $x =~ /foo(?!baz)/;  # matches, 'baz' doesn't follow 'foo'
    $x =~ /(?<!\s)foo/;  # matches, there is no \s before 'foo'  

 

 

 

Domain name registration service & domain search - 
Register cheap domain name from $7.95 and enjoy free domain services 
 

Cheap domain name search service -
Domain name services at just
$8.95/year only
 

Register domain name -
Buy domain name registration and cheap domain transfer at low, affordable price.

© 2002-2004 Active-Venture.com Web Site Hosting Service

 

[ The disadvantage of working over networks is that you can't so easily go into someone else's office and rip their bloody heart out   ]

 

 
 
 

Disclaimer: This documentation is provided only for the benefits of our web hosting customers.
For authoritative source of the documentation, please refer to http://www.perldoc.com