September 2012 Archives

Mentor-to-Hire for Perl Programmers

In the past couple of weeks I've heard from two startups which use Perl pervasively (and not my new powered-by-Perl projects). It's great to see more companies getting things done and being willing to talk about the technologies they use and why.

One subject came up in both conversations, and if you've been to a YAPC in the past couple of years or followed online discussions:

It's difficult to hire great Perl developers.

That's not news. It was news to me that both companies are solving this problem through a specific strategy of hire-and-mentor, based on the Modern Perl book! ("You should be flattered," a family member said. "They should hire you." I responded that they have my consulting information.)

Have you seen this trend too?

While I've long advocated activities such as pair programming and mentoring to help integrate new developers into the specific code, problem domains, and cultures of organizations, I've never quite thought about producing targeted mentoring material to help recruit great developers and help them scale the learning curve to become great Perl programmers.

The cynical businessman in me wonders if there's a market to produce this training material and help you customize it for your workplace. (Sure, the book is free, but putting it together in a way that makes sense for you has real business value, because it helps your developers get more productive more quickly. You pay for training, right?)

What do you think? How do you recruit and mentor new developers? Is it a huge cost for you that some targeted expert advice could do more cheaply, or is it part of building great teams that work together well, and, as such, the kind of core business activity that you have to do on your own?

If you look at it long enough, JavaScript will remind you of Perl.

I don't mean the language's syntax per se, though everyone who attends Algol family reunions looks vaguely similar. JavaScript is similar to Perl in that it's a language that's been around a while, that's accreted new features, that's showing some limitations of its original design decisions, and that has tremendous backwards compatibility concerns that make addressing the previous points difficult. Like Perl, it's a language that's seen a lot of programming novices pick it up to get things done, but it's also a language that rewards this whipupitude and lack of ceremony with the potential to make great messes.

(Don't worry, Perl fans: even if you don't ever use JavaScript, looking at Perl through the lens of another language will give you a better perspective on our favorite little powerhouse of a language.)

JavaScript reminds me of Perl 4 with a slightly better syntax and a couple of additional features (basic lexical scoping, higher order functions, and method dispatch, not to mention competing implementations with a lot of effort spent on optimizing away efficiencies inherent in some of the poorest misfeatures of the language's design).

JS has blatantly stolen some of Perl's better ideas, such as strict mode for JavaScript as well as some of Perl's worse ideas (for the sake of backwards compatibility, you enable strict mode in JavaScript by embedding a magic string literal in your code).

Even with the revival of server-side JavaScript in projects like Node.js as well as client-side, single-page applications, the language still has a ways to go before it can realize its true potential.

In one way, you can trace the rise of modern JavaScript to Douglas Crockford's book JavaScript: The Good Parts as well as libraries like jQuery, which take so much of the pain out of writing cross-platform JavaScript that you might say that people write more jQuery code than they do JavaScript, at least on the browser side.

(Fun fact: I modeled Modern Perl: the book after JavaScript: The Good Parts.)

If the idea of modern Perl has brought new energy into Perl (and I believe it has) and new perspectives on improving Perl 5 (and I believe it has), perhaps some of the lessons of the Perl Renaissance are valuable to other languages, such as JavaScript.

Embrace Modularity

Perl 5 is alive and well thanks to the CPAN. Without the CPAN, it's an interesting language with some great features and some flaws. With the CPAN, it's a toolchest full of precision instruments that good programmers can wield to make great things.

JavaScript has no real module system. Using various libraries and files in a single JavaScript project requires clever gyrations or at least rigorous pattern adherence, which fortunately the language supports. (I'm aware of npm. It's a start.)

For all of the flaws with Perl 5's import() mechanism, the compilation-time use and import() led directly to the creation of and growth of the CPAN. Libraries will happen without explicit language support, but explicit language support in Perl made something amazing and transformative happen.

Embrace Quality

I use Perl because it's so very well tested and verified. In fact, I believe that Perl's testing culture has no parallel in other languages. In a very real way, the entirety of the CPAN is a smoke testing system for the core language development.

The closest thing I've seen to a rigorous approach to writing high-quality JavaScript code with serious testing is my friend and colleague Jim Shore's Let's Code: Test-Driven JavaScript. (Disclaimer: I supported his project on KickStarter.)

Unlike big-A Agile, which is a morass of training and certification and books with too little focus on delivering successful projects or little-a agile which is like flossing: something everyone says they do but you know only happens a couple of times a week, at most, Jim's test-driven development gets done and actually works.

If the JavaScript world started to embrace a similar testing revolution, it'd be in a much better place. Couple that with a better library ecosystem—one encouraged by the language itself, and....

Reduce Technical Debt

(alternate heading: Reduce Crazy)

People who complain that "You can't change JavaScript! You'll break countless existing runtimes!" have a point. Waiting for whatever the oldest version of Internet Explorer you have to support to go away before you can take advantage of something designed and standardized in recent memory is a real hassle. (Thanks to Dave's hard work and not mine, ClubCompy supports old versions of IE, but it's a huge amount of work for little payoff.)

Sometimes you have to keep around the cruft, but you can't limit yourself to what was valid in old code if it will prevent you from producing new code.

What I wrote in Safety in (Version) Numbers is still true in Perl, and it's probably true in JavaScript. Only by explicitly documenting the version of the parser you want to use on a particular piece of code can you explicitly document your expectations, and only with explicit declaration does the parser have any hope of figuring out what you mean.

Even if the semantics of features change, the surface syntax of the language can upgrade slowly and deliberately if and only if programs written in the language make explicit their requirements.

Put another way, HTTP makes you say HTTP/1.0 or HTTP/1.1 for a reason.

JavaScript could fix some of its syntactic weirdnesses (lexical hoisting, automatic semicolon insertion) and add missing features (trailing commas in list literals, library use and dependency management, namespacing) with a strategy like this. For as much as I dislike "use strict" as a magic string constant, at least it works.

Documentation Guidelines

For as much as people complain that even decent Perl code can be inscrutable, Perl has copious internal documentation (see perldoc.perl.org for an online version of the docs which you most likely already have installed on your machine right now). The CPAN also has strong cultural guidelines for how to document modules and distributions. Even though these are community guidelines, projects adhere to them.

(I have lost track of how many patches I've sent to rephrase an unclear piece of documentation or add something missing or even to fix a typo; thank goodness for Git and Github, I say.)

Building a culture of communication and documentation is not easy, but the results are wonderful. I expect that any good library I use in Perl has decent tests, has a working installer, and has at least basic API documentation. I'm rarely disappointed.

Conclusion

I could write more about community management and expectations, but that's a non-technical subject with fluffier answers you can guess at anyhow.

JavaScript the language has a tremendous opportunity to learn from what Perl's done well and poorly. Stealing our better ideas will help it avoid some of the problems other languages get caught up in, and refining our best ideas will give us even better ideas to borrow back from.

Who knows? Maybe together we'll make programming easier, more pleasant, more reliable, and even more secure.

Announcing My New Powered-by-Perl Projects

In The Lost Secret of Mug-Driven Evolution, I asked "Is it reasonable to write new code in Perl right now?"

I also said "Build credible new things. Brag about them. Repeat."

It's time to do that!

Trendshare.org

While Onyx Neon still occupies a lot of my attention, I've been working with a local company called Big Blue Marble to develop small web-based businesses. This has taken me through a crash course in things like search engine optimization and statistics that I hadn't figured I'd ever need to know.

(Half the fun of small business is realizing that there's something you should have started doing months ago, that no one available has any experience with it, and that one of you has a week to get to a basic level of competence with it before you move on to the next crisis. The other half is realizing that the next time you tackle a problem like this, you'll be that much better at it.)

I've alluded to my work with financial analysis and basic financial literacy a few times now. That's part of my work at Big Blue Marble on a site called Trendshare. The big idea was to build something that our fathers would use to help them take back control of their investment portfolios. (They're smart men, but they've been for too long at the mercy of financial managers who don't have their best interests in mind, and their retirement accounts have suffered.)

I know most of my readers aren't newly retired men who've taken distributions and want to figure out where to put their money (short answer: find one or two good low-cost index funds, and then take maybe 10% and pick a couple of great stocks), and I know most of my readers are perfectly capable of building spreadsheets and pulling financial data from various APIs to measure important ratios like debt-to-equity and forward and historical PE, but we built a system that follows the Benjamin Graham value investing model, and we're pretty proud of it so far.

It's definitely closer to the minimum viable product, but what active software project isn't? Besides that, who knows if the analysis engine will prove more popular than the plain-English guide to investing that's on the site as well. Maybe people would rather read about how to invest than use tools to help them—and that's why we're announcing it now rather than later.

Almost all of the code in the project is Perl 5. I've started releasing some of the components to the CPAN. I'll continue to do so. I'd love it if you could do us a favor and tell friends and family who might have an interest in such a thing that it exists, because we'd love to get more traffic, and especially hyperlinks from lots of places. Basic user accounts are free, and we take great care to keep personal information private.

Modern Perl Whitepapers

Remember when mithaldu set up Perl Tutorials, after he noticed that too many of the top-ranking Perl pages in search engines were old, out of date, and even incorrect?

Onyx Neon recently received the domain Modern Perl Whitepapers as a generous gift from Ravi Kotecha. I decided that the world needed yet another Perl hub and have (slowly) started to add resources to help point searchers to the relevant places.

The entire site is one big SEO exercise, based on keyword research as to what people are looking for and what they're not finding well enough. (How cynical does that sound?) I do think it'll be a good resource because Onyx Neon has the flexibility to update it to match what people search for.

The code behind the site is a very silly, very simple static site generator with lots of conventions, but it seems to work very well for basic sites. I will eventually clean that up and release it too.

Book Info

I'm working on an outline for an updated Perl Testing book. My plan is still to create a Kickstarter project to build up interest for it (and to give me time to focus on it full time as well as to finish releasing the book publishing tools to the CPAN.)

There will be a 2012-2013 edition of Modern Perl in the next couple of months to replace the coverage of Perl 5.10 with the coverage of Perl 5.16.

I continue to intend to finish the Little Plack Book but recommend without reservation the Plack Handbook.

When I wrote Plack::Middleware::BotDetector, I planned to use it only for filtering out non-humans from our cohort analysis system. (You can read the entire rationale and explanation in Detecting Bots and Spiders with Plack Middleware.)

Since I wrote that article, I extracted that middleware from our project and released it on its own as Plack::Middleware::BotDetector. As is often the case, solving one problem suggests the possibility of solving multiple problems.

When I build systems that analyze data, I try to make it possible that the analysis can improve over time. Anomalous cases should be obvious and easy to correct and, when corrected, should no longer be obvious (because they're no longer anomalous). When detecting non-human user agents, we analyze our access logs for likely candidates to add to the list used to construct the regex passed to Plack::Middleware::BotDetector.

I had written a small Perl program to analyze our logs and give a histogram of user agents, but I still ended up eyeballing that list to see if any new bot user agents had appeared. (You can tell your SEO strategy is working when you get more bot traffic.)

Anytime you find yourself reviewing data by hand, ask yourself if a computer can do it.

We use Plack, obviously. Plack::Runner enables default middleware, including Plack::Middleware::AccessLog. That's responsible for writing an access log (you can configure the location), and we used that because it was easy and available.

"Wait," I asked myself. "Why am I reviewing this log information when I have to remember to exclude most of it because I already know it's bot traffic?" More important, our system already knows it's bot traffic, because the BotDetector middleware is already excluding those requests our from cohort analysis event logging.

What if I used the BotDetector to decide whether to log a request's information? (We don't do anything with these access logs which requires us to keep data about bot traffic.) That way, every update to the BotDetector regex would exclude more and more bot traffic, and the only things we'd see in our daily reports would be real users and bots we needed to exclude.

I wrote a custom piece of middleware in about two minutes:

package MyApp::Plack::Middleware::AccessLogNoBots;
# ABSTRACT: Plack middleware which only logs non-bot requests

use Modern::Perl;
use parent 'Plack::Middleware::AccessLog';

sub call
{
    my $self = shift;
    my $env  = $_[0];

    return $env->{'BotDetector.looks-like-bot'}
         ? $self->app->( $env )
         : $self->SUPER::call( @_ );
}

1;

This class extends the AccessLog middleware class to override the call() method. If the request looks like it came from a spider, it passes through the request to the next middleware. Otherwise, it lets the parent class log the request.

Installing this in our .psgi file was more difficult than writing the class, which says more about how easy it was to write this class than anything else. The only complicating factor is that Plack::Runner takes the responsibility for setting up its AccessLog component. I ended up with something like:

use MyApp;
use MyApp::BotDetector;

use Plack::Builder;
use Plack::App::File;

my $app = builder
{
    enable 'Plack::Middleware::BotDetector',
        bot_regex => MyApp::BotDetector::bot_regex();
    enable 'Plack::Middleware::ConditionalGET';
    enable 'Plack::Middleware::ETag', file_etag => [qw/inode mtime size/];
    enable 'Plack::Middleware::ContentLength';

    if ($ENV{MA_ACCESS_LOG})
    {
        open my $logfh, '>>', $ENV{MA_ACCESS_LOG};
        $logfh->autoflush( 1 );

        enable '+MyApp::Plack::Middleware::AccessLogNoBots',
            logger => sub { $logfh->print( @_ ) };
    }

    MyApp->apply_default_middlewares(MyApp->psgi_app);
};

... where the presence of the environment variable governs the location of the access log file. I also changed the scripts we use to launch this .psgi file to pass the --no-default-middleware flag to Plack::Runner.

The results have been wonderful (except that our site looked a lot busier before, when the logs showed Baidu spidering the whole thing at least twice a day). The decorator pattern of Plack continues to demonstrate its value, and the cleanliness of extension and ease of writing this code argues yet again for putting conditionals (log or don't log) where they belong.

All I could ask for is a little more customizability for Plack::Runner to make some of the code in my .psgi file go away, but I'm probably at the point where it makes sense to avoid plackup and write my own program which calls Plack::Runner directly.

Update: Miyagawa pointed out that Plack::Middleware::Conditional offers an alternate way to accomplish the same thing without writing custom middleware:

my $app = builder
{
    enable 'Plack::Middleware::BotDetector',
        bot_regex => MyApp::BotDetector::bot_regex();
    enable 'Plack::Middleware::ConditionalGET';
    enable 'Plack::Middleware::ETag', file_etag => [qw/inode mtime size/];
    enable 'Plack::Middleware::ContentLength';
    enable_if { ! $_[0]->{ 'BotDetector.looks-like-bot'}  } 'AccessLog';

    MyApp->apply_default_middlewares(MyApp->psgi_app);
};

We didn't use this technique because of the way we wanted to handle the log file, but that's what the Conditional middleware is for.

Structured Exceptions for Perl 5

| 7 Comments

In Features Perl 5 Needs in 2012, I left off one feature that's reasonably easy to add (in comparison) but would provide a huge benefit to Perl 5 now and in the future. Fortunately Mark Fowler reminded me that I forgot it: Perl 5 needs structured core exceptions.

I've written before about the difficulties of parsing unstructured data in a sensible way. That goes for the strings passed to two-argument open, for DBI connection strings, for the text of exception messages, and even the contents of subroutine attributes.

If an exception is only a string, everyone who has to get any relevant information from that string—if that information is even present—has to parse that string. If that information isn't present, adding it is difficult because you might break existing code that already parses it.

I've been experimenting with a way to add a core exception type to Perl 5 in a mostly backwards-compatible way. It's not yet ready to show off, but I can show the language-level interface I'm contemplating.

At a minimum, exception objects can and should include:

  • The string text of the exception we're all used to, say "Attempt to bless into a reference".
  • The name of the file which generated the error, if tracked (and it's almost always tracked)
  • The number of the line which generated the error, if available (and it's almost always available)
  • The type of the exception

The latter is a little more controversial; it requires a lot of people to agree on some sort of classification system for exceptions, like "data access error" or "IO error". It probably requires a system of roles, rather than a singly-rooted hierarchy, because some types of errors overlap. It ought to be visible to user code so users can define their own error types.

(The warnings categorization isn't perfect, but it works in most cases, so making a similar system is indeed possible.)

I've considered also adding a severity, but I don't know that that's broadly useful. It may be a possibility for future extension if necessary.

Modifying the core to produce these exceptions will touch a lot of code, but it's not exceedingly difficult. Nor is making that system available to XS modules difficult. The difficult part is dealing with Perl 5 code which does:

{
    local $@;

    eval { some_code() }

    if (ref  $@) { ... }
    else if ($@) { ... }
    else         { ... }
}

In particular, the open question is whether to make ref() lie, whether to promote exceptions to objects in a lexical scope with a feature enabled, or whether to perform some weird magic and add a feature to extract an exception object from an exception which looks like a string.

(I hope I'm kidding about the last one.)

An exception object would be a normal object and you'd use methods to read its attributes.

I haven't thought at all about how to produce an exception object at the Perl 5 language level. Perhaps a new keyword such as throw (protected by feature) is in order. Overloading die with magic to detect a hash reference won't work, nor will changing its behavior when it receives a list of arguments.

Even with all of these open questions, making this feature work is reasonably straightforward and mostly self contained. It's easy enough to refine over time, but it does offer a measurable amount of improvement at the language level.

I'll keep exploring.

In Features Perl 5 Needs in 2012, I wrote that Perl 5 needs "compact, native-type data structures".

While I generally like the polymorphism by which I can read external data into a data structure and then treat it like the kind of data I want it to be (string, integer, numeric), that flexibility is never free. Sometimes that flexibility comes at a cost.

I have mixed feelings about the Computer Language Benchmarks Game when used to compare languages, but its microbenchmarks are good targets for profiling data. (I used that strategy to good effect when I worked on Parrot. Certainly Parrot could benefit from other performance improvements, but removing low-hanging bottlenecks to maximum performance helped real programs in measurable ways.)

Earlier this year, I profiled the benchmarks game Perl meteor-contest entry. I didn't expect to find any obvious candidates for optimizing Perl 5, nor did I. I did find results that confirmed some of my expectations.

The program spends a lot of its time manipulating SVs. In fact, about 20% of the runtime of the program goes to this. This is almost exclusively upgrading SVs. (An SV is the basic data type in Perl 5. It can store an integer value, a string value, a numeric value, and more.)

Here's the problem. The profile shows that this program performs almost 65,000 array assignment operations (that's separate from a push or a splice). This ends up in the guts of the Perl 5 function Perl_pp_aassign in pp_hot.c, which eventually gets to this code:


        while (relem <= lastrelem) {    /* gobble up all the rest */
        SV **didstore;
        assert(*relem);
        sv = newSV(0);
        sv_setsv(sv, *relem);
        *(relem++) = sv;
        didstore = av_store(ary,i++,sv);
        if (magic) {
            if (SvSMAGICAL(sv))
            mg_set(sv);
            if (!didstore)
            sv_2mortal(sv);
        }
        TAINT_NOT;
        }

Essentially, for every element to assign to the array, create an entirely new SV, copy in the contents of the old SV (remember, copy semantics!), and move on. That copy operation ends up in Perl_sv_setsv_flags() in sv.c which goes through all sorts of gyrations to figure out what type of SV the destination should be.

Hold on, didn't that SV already get created? Yes it did, but it's not the right type. It might not be big enough to hold the right data from the source. After some 400 lines of heavily macroed code, it does eventually get the right data.

This program profiles at 1.1 million SV copy operations. In almost 450,000 of those, the source is undef, so nothing has to happen. In about 580,000 of those, the source is a plain integer value.

An integer is just data. It doesn't even need an SV if it's only ever treated as an integer. It doesn't need copying, because it can happily remain an integer value.

(Yes, if you're concerned that integer could ever overflow the 64 bits you're probably using to store integers, you have to figure out a way to upgrade to some sort of arbitrary sided big integer, but that's very doable.)

It turns out that this program spends a lot of time bookkeeping polymorphic internal data structures to store raw integers. That's not counting all of the time spent manipulating the reference counts of raw integers or checking to see if you need to do anything special to get or store the value of and from a raw integer.

If it were possible to add type annotations to Perl 5 such that an integer could only ever be an integer and never silently upgraded or an array could only contain integers, it would be possible to update this program to get a significant amount of speed. A 10% improvement is within the realm of possibility, if not more. (I find the program a little too dense to skim and explain at the same time I've been digging into the Perl 5 source code, so I wave my hands a little bit.)

Sure, PDL is great and I've used it effectively, and I'm fully capable of translating this code into XS where I can perform these manipulations in C where I can specify their types, but crossing the barrier between Perl and PDL can be costly unless you're very clever, and I don't want to write any more C than I have to when Perl could make my job much more convenient.

Unfortunately, adding typed data structures to Perl 5 is a huge amount of work. It would probably make the already complicated Perl_sv_setsv_flags() even more difficult to follow. It would require rewriting a lot of the internals. It probably depends on attaching things like special behaviors to SV data structures instead of the bodies of Perl 5 operations, but that's also a good thing.

On the other hand, using less memory and going faster is a good thing—and optional typing could open the door to further improvements such as type-directed optimizations and JITting.

I know I said "Perl 5 needs this feature in 2012", but given the amount of work necessary to implement it, I have to back off a little bit to say that it would be nice to have. I hope you can see its value, and I hope it's possible to get it in a version of the Perl language sometime in the next couple of years.

Why Perl 5 Needs an AST

| 9 Comments

When I wrote Features Perl 5 Needs in 2012, I promised to explain my thinking. I've explained Why Perl 5 Needs a Metaobject Protocol and previously Why Perl 5 Needs a Compiler-Free Extension Mechanism.

(If you don't follow the latter link, ask yourself this: would you use Perl 5 on another VM if you didn't have access to the CPAN? As long as any CPAN stack you need includes an XS dependency, you're stuck with the C implementation of Perl 5. Break that dependency and... you see where this is going.)

If you read a good book on compilers such as SICP or the Dragon book or even a good book on Lisp, which is half of the same thing, you'll eventually run into the idea that you can represent any program in a tree-like structure.

That is to say, any type of computation you wish to perform can be represented with the right data structure, and that data structure happens to be a tree. Even the venerable "Hello, world!" program from K&R is a tree:


STATEMENTS
    / \
print  exit
  |      |
"Hello,  0
 world!"

(You can represent this tree in a lot of ways, but you get the idea.) As SICP explains, the execution model governs how you traverse this tree. Obviously in this tree, you start at the topmost node, then evaluate leftmost depthfirst until you end.

Even though your processor likely doesn't execute programs by traversing trees and your language's favorite runtime doesn't either, tree structures are still very useful in compilers because they're simple data structures that are easy to traverse and produce and manipulate.

This is one reason that fans of Lisp think that Lisp is the ultimate programming language: you write it by writing this tree directly as source code and you can manipulate that tree with source code as if it were a tree, because it is.

Many of the rest of us don't want to spend the rest of our lives writing trees by hand, but sometimes we do want to do things with source code without having to write our own compilers, or at least our own parsers.

Do you know know people say "Parsing Perl is Hard?" That's because it is. It's not impossible, but the Perl 5 parser is a complicated program that mixes up the traditional roles of lexing and parsing such that replicating that parse completely is difficult, at best. This is why writing a syntax highlighter for Perl 5 is more difficult than writing one for Lisp.

Do you know how something like Devel::Declare works? I do, and I don't want to explain it to you. It's not particularly yucky magic, but it's not particularly lovely magic either.

Do you know how Perl 5's optimizer works? It's very limited.

Do you know how Perl 5 source filters work? Not that well! That's why the Switch module was deprecated in the commit after it became a core module.

Do you know how PPI works? Most of the time, very well, but with lots of magic and a few very well understood edge cases and not a lot of speed.

All of these problems have the same root cause: it's exceedingly difficult to manipulate a Perl 5 program as anything other than text, because the one thing that unambiguously understands that program won't share its understanding.

(That's not entirely true; a few years ago, Larry added a compile-time option to Perl 5 to produce an annotated tree from the parser/lexer/compiler, but no one's done much with it.)

A traditional compiler parses a document into a tree structure, then manipulates that tree into at least one and probably more trees and finally emits code in another language. It's very patriotic (from tree to shining tree). This pattern is no accident; it's the basis of many programs (everything is a compiler).

This process is so well understood that patterns exist for treating this process as a pipeline of tree transformations. As long as you know what kind of tree you're going to get and what kind of tree you need to produce, you can add your own transformation step. (If that sounds like Plack middleware, there's no coincidence there either.)

A formalized AST in Perl 5, representing an official separation between the lexing/parsing and execution phases of Perl 5, made available to any and all programs would let people write better syntax highlighters, sure, but also better IDEs (finding function declarations and associating them with source code would be easier) or debuggers (see again "finding things) or optimizers (a pipeline of tree transformations) or transliterations to other VMs or languages (still not entirely easy but much more possible) or serialization mechanisms (avoid parsing; dump an AST to the execution engine) or even little languages atop Perl with their own parsers which compile to the Perl 5 AST and run as if they had been Perl all along.

Think about that last point for a moment.

(Think about that last point in the context of syntax weirding mechanisms like MooseX::Declare, or anything which uses string eval.)

Are you sold yet?

Here's the bad news: this is a lot of work. It needs expertise and it needs research and planning. It needs at least one champion, and it probably needs funding. The first approach will probably fail. It will take longer than anyone wants. The only way it will happen is if someone stubborn appears and says "I'll do that!" or "I'll fund that!" and gets just enough support to keep going past the difficult parts.

Imagine how amazing it will be, though.

Why Perl 5 Needs a Metaobject Protocol

| 1 Comment

In Features Perl 5 Needs in 2012, I suggested five mostly-not-syntax features that Perl 5 could use right now.

The caveat to all of these features is that 1) someone must code them and 2) someone must maintain them. While p5p has traditionally done a lot with a little, remember that any suggestions or discussions will only make it into a Perl 5 release if enough effort goes into implementation to make them happen. If you want any of these features to happen, contact TPF about donations or volunteer your time and skills to help (even if all you feel comfortable doing is helping to refine the design of a feature, that's worthwhile).

I promised to explain these features in more detail. First up is a metaobject protocol for Perl 5. Stevan Little and Jesse Luehrs are working on a proof of concept MOP for Perl 5, so this feature is well underway.

If you're not familiar with a metaobject protocol, the book The Art of the Metaobject Protocol explains everything in extensive detail. (It's also a pretty good book about Common Lisp, so if you're interested in a semi-practical theoretical exploration of why computer sciencey things like the lambda calculus and well-designed extension mechanisms matter, this is a great book about the design and implementation of a language.)

If you don't want to read ~350 pages right now, the short definition of a MOP is an API for interacting with your object system. Classes (if your MOP supports them) are objects. You can define a new class by making method calls. You can create objects by making method calls. You can define an entirely new style of class by creating your own new metaclass with different behaviors. This is the reason Class::MOP predated Moose.

A good MOP provides a great theoretical foundation for a wonderful object system. Where many of us have defended Perl 5's object system as "minimal but effective and flexible", a good MOP in the core would let us get rid of bless in our code... and have more good features.

Performance Improvements

Moose can be slow to start, if you have lots of little classes, all on disk in their individual modules. Not only are you doing lots of IO to find and load those classes, but you have to build up all of the classes you've defined in memory through Moose's import mechanism (has() and extends() and with() are all function calls) which creates and populates a metaclass behind the scenes.

This happens every time you start your program.

Every MooseX extension you load changes things.

Though some of this happens in XS, where memory use can be smaller (less Perl data structure overhead) and speed can be faster (the same algorithm done in C instead of something written atop C), much of this happens at the Perl 5 level.

A great MOP could have its own custom data structures and code in the core to represent classes and methods and attributes, rather than Perl 5 hashes and arrays. A great MOP could have its own ops to handle things like introspection and dispatch. A great MOP could have parser support to avoid the overhead of calling Perl 5 functions (or crossing the XS boundary) to perform basic setup.

Simplicity of Alternate Object Systems

Perl 5's default object system is very flexible for strange people like me who've programmed in far too many languages (and implemented a few) who like various features of multiple languages and think "Hey, I know, I'll implement this in Perl 5." You bless a reference. You invoke a method. That's it. Everything else you can build yourself.

The downside is that you have to figure out how to build it yourself. Sometimes that means even diving into the source code of Perl 5 itself to see how things happen.

A good MOP provides a foundation on which you can provide your own custom behavior. As long as you hew to the protocol it defines, you can do anything you want: make a custom metaclass that allows simple and protected shared memory between workers, provide automatic auditing around specific methods, proxy transparently to remote resources, anything.

Better yet...

Interoperability Between Different Object Systems

One of the projects I maintain uses Moose heavily. One of its dependencies pulls in Moo. The authors of both object systems have gone to great lengths to make sure that both work together.

With a good MOP in place in the core, you have to go to great lengths not to interoperate well with other object systems. The default is interoperability.

(Try that in Perl 5 now, where some of the web programming or IO classes bless typeglobs. Oh, you expected a blessed hash? Too bad!)

Syntactic Improvements

As I alluded to earlier, a good MOP opens the door for declarative syntaxes for defining classes, methods, and attributes. We might finally get a class keyword and a method keyword. This is very nice because it'll be faster, but also...

Tool Improvements

... language keywords the language itself knows make their way into syntax highlighters and static analysis tools and reformatters and refactorers. It's easy to get a list of all of the methods in all of the classes in a file when it's unambiguous to pick out all of the names of classes in a file because you know they all have the class keyword in front of them, for example.

Compare that to today, where anything that looks like a coderef in a namespace is potentially a method or potentially a function and the only way to know is to invoke it, and why bother even trying?

Current Status

It's difficult to argue against putting a MOP in the core. Stevan and Jesse are working hard on it, but they could use more eyeballs on the tests and design and certainly more testing on various platforms. If you've been looking for an excuse to contribute to Perl 5.18 or Perl 5.20, this is a great place to start.

Features Perl 5 Needs in 2012

| 11 Comments

Perl 5 is alive and well.

While I was happy with Perl 5.10, the gradual but useful improvements since then have been of great help to me and my work. (I'm perhaps most thankful for the thankless work that's gone into Unicode 6.2 compliance and core support for Unicode throughout.) The yearly release cycle has made Perl 5 even more reliable, and that seems likely to continue.

A couple of years ago, someone asked me for my top five list of improvements Perl 5 needed. I was right about a couple, wrong about a couple of others, and missed one or two important ones. For those playing along at home, I missed:

  • A yearly release cycle
  • The feature pragma

For all its faults, feature broke the logjam by which it was impossible to add new syntax or backwards-incompatible features to Perl 5. The mechanism isn't perfect (especially its implementation), but it allows progress, and for that I'm thankful.

I can't predict what will be ready for Perl 5.18 next April or May, but I can update my list to include features I'd like to see in an upcoming release. I'll go into more detail of each of these features in future installments.

(For the CL fans in the audience, I like syntax, and I have a lot of trouble giving up CPAN's breadth and cross-platform goodness.)

What's on your list?

I haven't written an infinite loop bug in a while. I suppose I was due. (The Greek gods always punish hubris.)

I have a database table full of users. Let's call that table user. A user has one or more roles (from the role) table. A user can be an unverified user, a free user, a subscriber, or an administrator. (Administrators obviously are also subscribers, but you get the point.)

I use DBIx::Class for the database access layer. My User result class has a couple of methods which let other model actions query whether the user can perform specific operations. They're synthetic attributes:

has [qw( is_admin is_subscriber )], is => 'ro', lazy_build => 1;

sub _build_is_admin      { shift->find_role_by_name( 'admin'      ) }
sub _build_is_subscriber { shift->find_role_by_name( 'subscriber' ) }

Obviously the implementation of find_role_by_name() is supremely important. Here's how I originally wrote the code (no sense blaming anyone else for this, because I wrote all of the access control code myself.) See if you can catch the bug:

sub find_role_by_name
{
    my ($self, $name) = @_;

    while (my $role = $self->user_roles->next)
    {
        return 1 if $role->role->name eq $name;
    }

    return 0;
}

As a hint, the user_roles method represents the relationship between the user table and the role table. That method returns a DBIC resultset which represents the entries in the table with a foreign key to the user table.

See the bug yet? It took me a while. The problem is that every invocation of the loop fetches a new resultset. Every invocation of the loop fetches only the first result from that resultset.

My tests (yeah, I wrote the tests too, before I wrote this code in fact, just like you're supposed to do) didn't catch that because the rest of the code only ever checked for one user role, the administrator role, anywhere else. Only when I added code for the subscriber role did the subscription tests trigger this infinite loop. (Now how abhorred in my imagination is that most excellent jest!)

I wish I could say that I diagnosed the cause immediately. I didn't. I stared at the code for far too long, until I had the presence of mind to put a debugging statement in the loop body to print the name of the role. (Stat officium pristina nomine.)

Here's the corrected code, with a single expression hoisted into a scalar variable where it may endure properly:

sub find_role_by_name
{
    my ($self, $name) = @_;
    my $user_roles_rs = $self->user_roles;

    while (my $role = $user_roles_rs->next)
    {
        return 1 if $role->role->name eq $name;
    }

    return 0;
}

I will say this: expecting your test suite to pass in under 30 seconds makes this sort of thing much easier to catch. If I'd had to wait for a continuous integration server to get around to checking this commit, I wouldn't have fixed it as quickly as I did.

Then again, it's easier to debug bugs you never write in the first place.

Oh, and if you catch the literary allusion in the title, here's your prize: congratulations! You might be overeducated. (If you caught the other three allusions, have you ever noticed how much Pynchon's V influenced Stephenson's Cryptonomicon?)

In Speeding Up My Test Suite by 25%, I lamented that writing modern Perl code often means using code which mangles the language at use time. Consider typical Moose code:

package MyApp::App::Role::HasMailer;
# ABSTRACT: role which provides other app roles with a mailer

use Modern::Perl;
use Moose::Role;

use MyApp::Mailer;

has 'mailer', is => 'ro', lazy_build => 1;

sub _build_mailer
{
    my $self        = shift;
    my $config      = do 'myapp_local.pl';
    my $mail_client = $config->{'Model::UserMail'}{mail_client};
    return MyApp::Mailer->new( $mail_client );
}

1;

... where functions such as has and with and extends look like declarations (such as my and sub), but are actually code to run.

The difference is important. A declaration that's part of the language's grammar need only be parsed to have its effects take place, while a statement or expression needs to run to take place.

This is, of course, why variable declarations makes lexical scoping trivial to understand for experienced Perl programmers, why binding closures to their lexical environments is easy in the simple cases, and why binding closures to their lexical environments when you use the STRING form of eval is so difficult.

As Moose hackers will tell you, Moose isn't a simple preprocessor you run over your code once to generate longer, uglier code without syntactic goodness. Instead, classes are built, not declared.

This is endemic to Moose, but it's not a characteristic specific to Modern Perl. It's inherent in Perl 5 itself, even the 1994 version. Perl 5 lets you run arbitrary code while parsing happens.

While you can gain tremendous flexibility with this approach (building up closures with partially applied arguments to export to caller namespaces), you can't serialize the resulting code as easily because you may have to run arbitrary code to restore your program in memory.

Put another way, Exporter is a library while it should be core language behavior.

In the simple case where a library wants to export a couple of symbols into the caller's namespace, say min and max from List::Util, why is there no simple syntax for marking those symbols as exportable in List::Util without having to run code (or worse yet, inherit from) Exporter?

If a declarative syntax existed—with language support or at least the broad consistency to allow for tool support—such that it were statically possible to determine the symbols exported from one package and imported into another package, precompilation would be easier. A module could have a static list of exports provided in a manifest of sorts. (A good optimizer could even decline to import unused code!)

I realize it isn't always possible to reduce all cases of exporting to a list of static symbols (nor is it desirable to remove that power), but at least 80% of the code I write would benefit from this. We'd also be able to have better tool support for the language; it would be easier to discover which symbols come from where, and we might even close the "You can't always tell which of two interpretations of a parse tree is valid in Perl 5 code thanks to import and BEGIN" gap a little further.

(... but then someone will ask why strict is a pragma and not core language behavior, especially after looking at its implementation, and then we'll all go to the paint store to pick out our favorite colors. See also "I didn't provide an example of this declarative syntax, so you can't argue about it in the comments.")

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide

Categories

Pages

About this Archive

This page is an archive of entries from September 2012 listed from newest to oldest.

August 2012 is the previous archive.

October 2012 is the next archive.

Find recent content on the main index or look in the archives to find all content.


Powered by the Perl programming language

what is programming?