|
Pod is embedded in files, typically Perl source files -- although you can write a file that's
nothing but Pod.
A line in a file consists of zero or more non-newline characters, terminated by either
a newline or the end of the file.
A newline sequence is usually a platform-dependent concept, but Pod parsers should
understand it to mean any of CR (ASCII 13), LF (ASCII 10), or a CRLF (ASCII 13 followed
immediately by ASCII 10), in addition to any other system-specific meaning. The first CR/CRLF/LF
sequence in the file may be used as the basis for identifying the newline sequence for parsing
the rest of the file.
A blank line is a line consisting entirely of zero or more spaces (ASCII 32) or tabs
(ASCII 9), and terminated by a newline or end-of-file. A non-blank line is a line
containing one or more characters other than space or tab (and terminated by a newline or
end-of-file).
(Note: Many older Pod parsers did not accept a line consisting of spaces/tabs and then
a newline as a blank line -- the only lines they considered blank were lines consisting of no
characters at all, terminated by a newline.)
Whitespace is used in this document as a blanket term for spaces, tabs, and newline
sequences. (By itself, this term usually refers to literal whitespace. That is, sequences of
whitespace characters in Pod source, as opposed to "E<32>", which is a
formatting code that denotes a whitespace character.)
A Pod parser is a module meant for parsing Pod (regardless of whether this involves
calling callbacks or building a parse tree or directly formatting it). A Pod formatter
(or Pod translator) is a module or program that converts Pod to some other format (HTML,
plaintext, TeX, PostScript, RTF). A Pod processor might be a formatter or translator, or
might be a program that does something else with the Pod (like wordcounting it, scanning for
index points, etc.).
Pod content is contained in Pod blocks. A Pod block starts with a line that matches
<m/\A=[a-zA-Z]/>, and continues up to the next line that matches m/\A=cut/ --
or up to the end of the file, if there is no m/\A=cut/ line.
Within a Pod block, there are Pod paragraphs. A Pod paragraph consists of non-blank
lines of text, separated by one or more blank lines.
For purposes of Pod processing, there are four types of paragraphs in a Pod block:
-
A command paragraph (also called a "directive"). The first line of this
paragraph must match m/\A=[a-zA-Z]/. Command paragraphs are typically one line,
as in:
But they may span several (non-blank) lines:
=for comment
Hm, I wonder what it would look like if
you tried to write a BNF for Pod from this.
=head3 Dr. Strangelove, or: How I Learned to
Stop Worrying and Love the Bomb
|
|
Some command paragraphs allow formatting codes in their content (i.e., after the
part that matches m/\A=[a-zA-Z]\S*\s*/), as in:
=head1 Did You Remember to C<use strict;>?
|
|
In other words, the Pod processing handler for "head1" will apply the same
processing to "Did You Remember to C<use strict;>?" that it would to an
ordinary paragraph -- i.e., formatting codes (like "C<...>") are parsed and
presumably formatted appropriately, and whitespace in the form of literal spaces and/or tabs
is not significant.
-
A verbatim paragraph. The first line of this paragraph must be a literal space or
tab, and this paragraph must not be inside a "=begin identifier", ...
"=end identifier" sequence unless "identifier" begins with
a colon (":"). That is, if a paragraph starts with a literal space or tab, but is
inside a "=begin identifier", ... "=end identifier"
region, then it's a data paragraph, unless "identifier" begins with a
colon.
Whitespace is significant in verbatim paragraphs (although, in processing, tabs
are probably expanded).
- An ordinary paragraph. A paragraph is an ordinary paragraph if its first line
matches neither
m/\A=[a-zA-Z]/ nor m/\A[ \t]/, and if it's
not inside a "=begin identifier", ... "=end identifier"
sequence unless "identifier" begins with a colon (":").
- A data paragraph. This is a paragraph that is inside a "=begin identifier"
... "=end identifier" sequence where "identifier" does not
begin with a literal colon (":"). In some sense, a data paragraph is not part of
Pod at all (i.e., effectively it's "out-of-band"), since it's not subject to most
kinds of Pod parsing; but it is specified here, since Pod parsers need to be able to call an
event for it, or store it in some form in a parse tree, or at least just parse around
it.
For example: consider the following paragraphs:
# <- that's the 0th column
=head1 Foo
Stuff
$foo->bar
=cut
|
|
Here, "=head1 Foo" and "=cut" are command paragraphs because the first
line of each matches m/\A=[a-zA-Z]/. "[space][space]$foo->bar"
is a verbatim paragraph, because its first line starts with a literal whitespace character (and
there's no "=begin"..."=end" region around).
The "=begin identifier" ... "=end identifier" commands stop
paragraphs that they surround from being parsed as data or verbatim paragraphs, if identifier
doesn't begin with a colon. This is discussed in detail in the section /About Data Paragraphs and
"=begin/=end" Regions.
|
|