|
perlio provides this, but the interface could be a lot more straightforward.
When the lexer sees, for instance, bytes::length, it should automatically load
the bytes pragma.
Danger, Will Robinson! Discussing the semantics of "\x{F00}", "\xF00"
and "\U{F00}" on P5P will lead to a long and boring flamewar.
For displaying PVs with control characters, embedded nulls, and Unicode. This would be
useful for printing warnings, or data and regex dumping, not_a_number(), and so on.
Requirements: should handle both byte and UTF8 strings. isPRINT() characters printed as-is,
character less than 256 as \xHH, Unicode characters as \x{HHH}. Don't assume ASCII-like,
either, get somebody on EBCDIC to test the output.
Possible options, controlled by the flags: - whitespace (other than ' ' of isPRINT())
printed as-is - use isPRINT_LC() instead of isPRINT() - print control characters like this:
"\cA" - print control characters like this: "^A" - non-PRINTables printed
as '.' instead of \xHH - use \OOO instead of \xHH - use the C/Perl-metacharacters like \n, \t
- have a maximum length for the produced string (read it from *lenp) - append a
"..." to the produced string if the maximum length is exceeded - really fancy: print
unicode characters as \N{...}
NOTE: pv_display(), pv_uni_display(), sv_uni_display() are already doing something like the
above.
This may or may not be possible with the current regular expression engine. The idea is
that, for instance, \b needs to be algorithmically computed if you're dealing
with Thai text. Hence, the \b assertion wants to be overloaded by a function.
- Allow for long form of the General Category Properties, e.g
\p{IsOpenPunctuation},
not just the abbreviated form, e.g. \p{IsPs}.
-
Allow for the metaproperties: XID Start, XID Continue, NF*_NO,
NF*_MAYBE (require the DerivedCoreProperties and
DerviceNormalizationProperties files).
There are also multiple value properties still unimplemented: Numeric Type,
East Asian Width.
-
Case Mappings? http://www.unicode.org/unicode/reports/tr21/
|
|
Mostly implemented (all of 1:1, 1:N, N:1), only the "final sigma" and
locale-specific rules of SpecCase are not implemented.
- UTF-8 identifier names should probably be canonicalized: NFC?
-
UTF-8 in package names and sub names? The first is problematic because of the mapping
to pathnames, ditto for the second one if one does autosplitting, for example. Some of
this works already in 5.8.0, but essentially it is unsupported. Constructs to consider, at
the very least:
use utf8;
package UnicodePackage;
sub new { bless {}, shift };
sub UnicodeMethod1 { ... $_[0]->UnicodeMethod2(...) ... }
sub UnicodeMethod2 { ... } # in here caller(0) should contain Unicode
...
package main;
my $x = UnicodePackage->new;
print ref $x, "\n"; # should be Unicode
$x->UnicodeMethod1(...);
my $y = UnicodeMethod3 UnicodePackage ...;
|
|
In the above all UnicodeXxx contain (identifier-worthy) characters beyond the
code point 255, for example 256. Wherever package/class or subroutine names can be
returned needs to be checked for Unicodeness.
See perlunicode/UNICODE
REGULAR EXPRESSION SUPPORT LEVEL for what's there and what's missing. Almost all of Levels
2 and 3 is missing, and as of 5.8.0 not even all of Level 1 is there. They have some tricks
Perl doesn't yet implement, such as character class subtraction.
http://www.unicode.org/unicode/reports/tr18/
|
|
There are some suggestions to use for example something like this: default to "(thread
exiting first will) wait for the other threads until up to 60 seconds". Other
possibilities:
Do not wait.
use threads wait_for => 10;
|
|
Wait up to 10 seconds.
use threads wait_for => -1;
|
|
Wait for ever.
http://archive.develooper.com/perl5-porters@perl.org/msg79618.html
To better support nonpreemptive threading systems, perhaps some of the blocking functions
internally in Perl should do a yield() before a blocking call. (Now certain threads tests ({basic,list,thread.t})
simply do a yield() before they sleep() to give nonpreemptive thread implementations a
chance).
In some cases, like the GNU pth, which has replacement functions that are nonblocking (pth_select
instead of select), maybe Perl should be using them instead when built for threading.
Add PERL_ASYNC_CHECK to opcodes which loop; replace sigsetjmp
with sigjmp; check wait for signal safety.
This was done for 5.6.0, but needs reworking for 5.7.x
POSIX 1003.1 1996 Edition support--realtime stuff: POSIX semaphores, message queues, shared
memory, realtime clocks, timers, signals (the metaconfig units mostly already exist for these)
Reader-writer locks, realtime/asynchronous IO
There are non-core modules, such as Socket6, but these will need integrating
when IPv6 actually starts to really happen. See RFC 2292 and RFC 2553.
Floating point formatting is still causing some weird test failures.
Locales and Unicode interact with each other in unpleasant ways. One possible solution
would be to adopt/support ICU:
http://oss.software.ibm.com/developerworks/opensource/icu/project/
|
|
[1234567890] aren't the only numerals any more.
([=a=] for equivalence classes, [.ch.] for collation.) These are
dependent on Unicode normalization and collation.
Currently, the user has to optimize foo|far and foo|goo into f(?:oo|ar)
and [fg]oo by hand; this could be done automatically.
All the code we ship with Perl needs to be sensible about temporary file handling, locking,
input validation, and so on.
Currently there are several problems with the setting of uids ($<, $> for the real
and effective uids). Firstly, what exactly setuid() call gets invoked in which platform is
simply a big mess that needs to be untangled. Secondly, the effects are apparently not
standard across platforms, (if you first set $< and then $>, or vice versa, being uid ==
euid == zero, or just euid == zero, or as a normal user, what are the results?). The test
suite not (usually) being run as root means that these things do not get much testing.
Thirdly, there's quite often a third uid called saved uid, and Perl has no knowledge of that
feature in any way. (If one has the saved uid of zero, one can get back any real and effective
uids.) As an example, to change also the saved uid, one needs to set the real and effective
uids twice-- in most systems, that is: in HP-UX that doesn't seem to work.
Have a way to introduce user-defined opcodes without the subroutine call overhead of an
XSUB; the user should be able to create PP code. Simon Cozens has some ideas on this.
Windows needs a way to know what version of an XS or libperl DLL it's loading.
$( may return "foo bar baz". Unfortunately, since groups can
theoretically have spaces in their names, this could be one, two or three groups.
NaN and inf support is particularly troublesome. (fp_classify(),
fp_class(), fp_class_d(), class(), isinf(), isfinite(), finite(), isnormal(), unordered(),
<ieeefp.h>, <fp_class.h> (there are metaconfig units for all these) (I think),
fp_setmask(), fp_getmask(), fp_setround(), fp_getround() (no metaconfig units yet for these).
Don't forget finitel(), fp_classl(), fp_class_l(), (yes, both do, unfortunately, exist), and
unorderedl().)
As of Perl 5.6.1, there is a Perl macro, Perl_isnan().
Nicholas Clark has done a lot of work on this, but work is continuing. +, -
and * work, but guards need to be in place for %, /, &,
oct, hex and pack.
The CPAN module Marek::Pod::Html may be a more suitable basis for a pod2html
converter; the current one duplicates the functionality abstracted in Pod::Parser,
which makes updating the POD language difficult.
When a new Perl is being beta tested, porters have to manually grab their favourite CPAN
modules and test them - this should be done automatically.
We have all the other BSD socket functions but these. There are metaconfig units for these
functions which can be added. To avoid these being new opcodes, a solution similar to the way sockatmark
was added would be preferable. (Autoload the IO::whatever module.)
The new-style patterns need full documentation, and the whole document needs to be a lot
clearer.
Simon Cozens has done some work on this but it needs a rethink.
|
|