|
The Perl interpreter can be regarded as a closed box: it has an API for feeding it code or
otherwise making it do things, but it also has functions for its own use. This smells a lot
like an object, and there are ways for you to build Perl so that you can have multiple
interpreters, with one interpreter represented either as a C structure, or inside a
thread-specific structure. These structures contain all the context, the state of that
interpreter.
Two macros control the major Perl build flavors: MULTIPLICITY and USE_5005THREADS. The
MULTIPLICITY build has a C structure that packages all the interpreter state, and there is a
similar thread-specific data structure under USE_5005THREADS. In both cases,
PERL_IMPLICIT_CONTEXT is also normally defined, and enables the support for passing in a
"hidden" first argument that represents all three data structures.
All this obviously requires a way for the Perl internal functions to be either subroutines
taking some kind of structure as the first argument, or subroutines taking nothing as the
first argument. To enable these two very different ways of building the interpreter, the Perl
source (as it does in so many other situations) makes heavy use of macros and subroutine
naming conventions.
First problem: deciding which functions will be public API functions and which will be
private. All functions whose names begin S_ are private (think "S" for
"secret" or "static"). All other functions begin with "Perl_",
but just because a function begins with "Perl_" does not mean it is part of the API.
(See /Internal Functions.) The easiest way to be sure
a function is part of the API is to find its entry in perlapi. If it exists in perlapi, it's part of the API. If
it doesn't, and you think it should be (i.e., you need it for your extension), send mail via perlbug explaining why you think
it should be.
Second problem: there must be a syntax so that the same subroutine declarations and calls
can pass a structure as their first argument, or pass nothing. To solve this, the subroutines
are named and declared in a particular way. Here's a typical start of a static function used
within the Perl guts:
STATIC void
S_incline(pTHX_ char *s)
|
|
STATIC becomes "static" in C, and may be #define'd to nothing in some
configurations in future.
A public function (i.e. part of the internal API, but not necessarily sanctioned for use in
extensions) begins like this:
void
Perl_sv_setsv(pTHX_ SV* dsv, SV* ssv)
|
|
pTHX_ is one of a number of macros (in perl.h) that hide the details of the
interpreter's context. THX stands for "thread", "this", or
"thingy", as the case may be. (And no, George Lucas is not involved. :-) The first
character could be 'p' for a prototype, 'a' for argument, or 'd' for declaration,
so we have pTHX, aTHX and dTHX, and their variants.
When Perl is built without options that set PERL_IMPLICIT_CONTEXT, there is no first
argument containing the interpreter's context. The trailing underscore in the pTHX_ macro
indicates that the macro expansion needs a comma after the context argument because other
arguments follow it. If PERL_IMPLICIT_CONTEXT is not defined, pTHX_ will be ignored, and the
subroutine is not prototyped to take the extra argument. The form of the macro without the
trailing underscore is used when there are no additional explicit arguments.
When a core function calls another, it must pass the context. This is normally hidden via
macros. Consider sv_setsv. It expands into something like this:
ifdef PERL_IMPLICIT_CONTEXT
define sv_setsv(a,b) Perl_sv_setsv(aTHX_ a, b)
/* can't do this for vararg functions, see below */
else
define sv_setsv Perl_sv_setsv
endif
|
|
This works well, and means that XS authors can gleefully write:
and still have it work under all the modes Perl could have been compiled with.
This doesn't work so cleanly for varargs functions, though, as macros imply that the number
of arguments is known in advance. Instead we either need to spell them out fully, passing aTHX_
as the first argument (the Perl core tends to do this with functions like Perl_warner), or use
a context-free version.
The context-free version of Perl_warner is called Perl_warner_nocontext, and does not take
the extra argument. Instead it does dTHX; to get the context from thread-local storage. We #define
warner Perl_warner_nocontext so that extensions get source compatibility at the expense
of performance. (Passing an arg is cheaper than grabbing it from thread-local storage.)
You can ignore [pad]THXx when browsing the Perl headers/sources. Those are strictly for use
within the core. Extensions and embedders need only be aware of [pad]THX.
dTHR was introduced in perl 5.005 to support the older thread model. The older
thread model now uses the THX mechanism to pass context pointers around, so dTHR
is not useful any more. Perl 5.6.0 and later still have it for backward source compatibility,
but it is defined to be a no-op.
When Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call any functions in the
Perl API will need to pass the initial context argument somehow. The kicker is that you will
need to write it in such a way that the extension still compiles when Perl hasn't been built
with PERL_IMPLICIT_CONTEXT enabled.
There are three ways to do this. First, the easy but inefficient way, which is also the
default, in order to maintain source compatibility with extensions: whenever XSUB.h is
#included, it redefines the aTHX and aTHX_ macros to call a function that will return the
context. Thus, something like:
in your extension will translate to this when PERL_IMPLICIT_CONTEXT is in effect:
Perl_sv_setsv(Perl_get_context(), asv, bsv);
|
|
or to this otherwise:
You have to do nothing new in your extension to get this; since the Perl library provides
Perl_get_context(), it will all just work.
The second, more efficient way is to use the following template for your Foo.xs:
#define PERL_NO_GET_CONTEXT /* we want efficiency */
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
static my_private_function(int arg1, int arg2);
static SV *
my_private_function(int arg1, int arg2)
{
dTHX; /* fetch context */
... call many Perl API functions ...
}
[... etc ...]
MODULE = Foo PACKAGE = Foo
/* typical XSUB */
void
my_xsub(arg)
int arg
CODE:
my_private_function(arg, 10);
|
|
Note that the only two changes from the normal way of writing an extension is the addition
of a #define PERL_NO_GET_CONTEXT before including the Perl headers, followed by a
dTHX; declaration at the start of every function that will call the Perl API.
(You'll know which functions need this, because the C compiler will complain that there's an
undeclared identifier in those functions.) No changes are needed for the XSUBs themselves,
because the XS() macro is correctly defined to pass in the implicit context if needed.
The third, even more efficient way is to ape how it is done within the Perl guts:
#define PERL_NO_GET_CONTEXT /* we want efficiency */
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
/* pTHX_ only needed for functions that call Perl API */
static my_private_function(pTHX_ int arg1, int arg2);
static SV *
my_private_function(pTHX_ int arg1, int arg2)
{
/* dTHX; not needed here, because THX is an argument */
... call Perl API functions ...
}
[... etc ...]
MODULE = Foo PACKAGE = Foo
/* typical XSUB */
void
my_xsub(arg)
int arg
CODE:
my_private_function(aTHX_ arg, 10);
|
|
This implementation never has to fetch the context using a function call, since it is
always passed as an extra argument. Depending on your needs for simplicity or efficiency, you
may mix the previous two approaches freely.
Never add a comma after pTHX yourself--always use the form of the macro with
the underscore for functions that take explicit arguments, or the form without the argument
for functions with no explicit arguments.
If you create interpreters in one thread and then proceed to call them in another, you need
to make sure perl's own Thread Local Storage (TLS) slot is initialized correctly in each of
those threads.
The perl_alloc and perl_clone API functions will automatically
set the TLS slot to the interpreter they created, so that there is no need to do anything
special if the interpreter is always accessed in the same thread that created it, and that
thread did not create or call any other interpreters afterwards. If that is not the case, you
have to set the TLS slot of the thread before calling any functions in the Perl API on that
particular interpreter. This is done by calling the PERL_SET_CONTEXT macro in
that thread as the first thing you do:
/* do this before doing anything else with some_perl */
PERL_SET_CONTEXT(some_perl);
... other Perl API calls on some_perl go here ...
|
|
Just as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything that the interpreter
knows about itself and pass it around, so too are there plans to allow the interpreter to
bundle up everything it knows about the environment it's running on. This is enabled with the
PERL_IMPLICIT_SYS macro. Currently it only works with USE_ITHREADS and USE_5005THREADS on
Windows (see inside iperlsys.h).
This allows the ability to provide an extra pointer (called the "host"
environment) for all the system calls. This makes it possible for all the system stuff to
maintain their own state, broken down into seven C structures. These are thin wrappers around
the usual system calls (see win32/perllib.c) for the default perl executable, but for a more
ambitious host (like the one that would do fork() emulation) all the extra work needed to
pretend that different interpreters are actually different "processes", would be
done here.
The Perl engine/interpreter and the host are orthogonal entities. There could be one or
more interpreters in a process, and one or more "hosts", with free association
between them.
|