May 2011 Archives

Summer 2011 Conference Talks

I'm speaking at two conferences this summer.

The Open Source Bridge conference takes place in downtown Portland, Oregon on 21 June through 24 June. This is a great mixture of real free software and open source projects with a particular focus on the community and small business aspects of organization and collaboration. My Modern Perl Made Painless shows off some of the most important libraries and extensions in the Perl 5 world of 2011 to help you accomplish more with less work and greater confidence.

YAPC::NA 2011 runs from 27 June through 01 July in Asheville, North Carolina (with hackathons and training classes the final two days). YAPCs are all homegrown, Perl-specific conferences. If I could attend only one conference, it would be a YAPC. This is the best place to meet and work with (some of) the world's best Perl programmers, documentors, implementors, and enthusiasts. My Modern Advocacy for Modern Perl reflects on two years of people talking about Perl's renaissance, the enlightenment of Perl programmers, and how a program written to take advantage of the modern releases of Perl 5 should behave. With all of the wonderful things in the Perl world right now, how can we convince our friends, colleagues, and clients that Perl is a great choice?

I'm working on a couple of contracts right now and hope to announce a new crop of books by then. (I may also bring a box of Modern Perl: the book at a special conference price if there's interest.) Even if you don't care about any of that, if Portland or Asheville are within easy travel distance of you, I encourage you to consider either or both conference as a great training and social experience.

Closures Cure Global Pollution

| 2 Comments

A point illustrated in two digressive examples.

Plack is Silly Simple

I gave a presentation to the Portland Perl Mongers earlier this month, in which I explained just how simple Plack is:

  • A web application is a function reference.
  • That function reference takes a single parameter, a Plack::Request object.
  • That function reference returns a single result, a Plack::Response object.

It's so simple, it's silly. At least it seems silly. What's the advantage over something like mod_perl or CGI or FastCGI?

Composability and middleware.

Composability

I spend part of my time consulting with a couple of friends on a project named ClubCompy. They've built a virtual 8-bit computer in JavaScript and HTML 5—you don't have to install any software besides a modern web browser to get a simple programming environment. (Obviously building more exciting programming environments is possible, but the approach is didactic, favoring simplicity and text-based programming over something like Scratch.)

ClubCompy's code takes advantage of JavaScript's advantages and tries to avoid JavaScript's disadvantages. All of the classes in CC attach to a single god object which is a global variable named ClubCompy. If you want to instantiate another object, you access its name through a property on the ClubCompy object. This is all well and good—it reduces contamination of the global namespace.

Except that there's still contamination of the global namespace.

CC has a rudimentary inclusion system modeled somewhat after the C programming language's #include system. When you include a file, you must #define a symbol which you check as a flag to make sure you don't include a file again. In JavaScript this looks something like:

if (!ClubCompy.script_bufferedChar)
{
    ClubCompy.script_bufferedChar = true;	

    ...
}

Every included script (because you don't want tens of thousands of lines in a single file in any programming language) has a similar guard.

One obvious improvement is:

ClubCompy.scripts = {};

if (!ClubCompy.scripts.bufferedChar)
{
    ClubCompy.scripts.bufferedChar = true;

    ...
}

... which has its obvious parallels to Perl 5's %INC and could ggeneralize further into something more like:

ClubCompy.require( 'bufferedChar' );

... to manage this information.

The inclusion mechanism in ClubCompy wraps the contents of each included file in an anonymous function so as to protect the global namespace. That is to say, each individual file can declare its own lexical variables without worrying that those lexical variables leak out elsewhere. This is well and good.

Then Dave said "It's too bad we can't have multiple ClubCompy instances running on a page."

Remember how I said Plack is silly simple?

On Naming Anonymous Things

ClubCompy currently only lets you run one instance on a page because the ClubCompy object is global. You get one. That's it. There are no more. This object is global so that everyone can refer to it by name without worrying about how to access it.

That's why people make things Singletons, right?

If the ClubCompy object weren't global, there'd be no problem running multiple instances of ClubCompy on any given page, because they wouldn't be in conflict—but how do you find a global object if it's not global?

I belabor the point (many of you already know the answer) to explain one in terms of the other as well as a design principle in terms of both. In JavaScript, can you tell if ClubCompy is a global variable in this expression:

ClubCompy.renderFrame(frameObj);

It's easier in this snippet:

var render = function (frameObj)
{
    ClubCompy.renderFrame(frameObj);
}

In truth, the JavaScript compiler doesn't really care. You could just as well:

var render = function (cc, frameObj)
{
    cc.renderFrame(frameObj);
};

Or in Perl 5:

my $render = sub
{
    ClubCompy->render_frame( shift );
};

... versus:

my $render = sub
{
    my ($cc, $frame) = @_;
    $cc->render_frame( $frame );
}

Perl doesn't care whether the invocant is a hard-coded string or a global variable or a lexical scoped outside of the block or a lexical parameter passed to the function. Likewise, JavaScript will look up the symbol according to its regular rules.

In other words, allowing ClubCompy to run multiple instances per page requires only changing the binding of the ClubCompy symbol within the libraries which use it. If the top-level code which creates a new ClubCompy instance does so without polluting the global namespace, top-level code can create multiple instances which do not conflict at the global namespace level.

The loader which wraps all of these included JavaScript files in an anonymous function turns from:

var includedFile = function ()
{
    # ... included text here
};
    
includedFile();

... to:

var includedFile = function (cc)
{
    var ClubCompy = cc;

    # included text here
};

includedFile( nonGlobalClubCompy );

... and everything else just works.

Behold the magic of lexical variable encapsulation and the magic of closures—because a ClubCompy instance on a page is just a function reference.

One sign of a good interface is the degree to which it encourages users to do the right thing and discourages users from doing the wrong thing. While a malicious or distracted or inexperienced user may still do things incorrectly, a good interface reduces that possibility.

It's possible—even advisable—to analyze buggy code to see if the bugs point to misfeatures in the interfaces. Plenty of bugs come from misunderstandings of requirements, but plenty of bugs lurk in interfaces which are unclear or difficult to use correctly.

Consider this snippet of code:

use Moose;

has 'summary', is => 'rw';

around summary => sub
{
    my ($orig, $self) = splice @_, 0, 2;
    return $self->$orig() unless @_;
    my $summary       = shift;

    return $self->$orig() unless length $summary;

    my $content = $self->remove_markup( $summary );
    return '' unless $content;

    $self->$orig( $content );
};

This snippet of Moose-based code declares an attribute named summary with an autogenerated accessor and mutator of $object->summary. It also wraps this autogenerated method with code to sanitize any data set through the mutator by removing markup.

Everything seems straightforward, but consider all of the parts of the interface which might go wrong. (This is not a criticism of Moose, which does the best it can; it's an exploration of language and API design which just happened to come up as I debugged some very buggy code I wrote.)

my ($orig, $self) = splice @_, 0, 2;

Perl 5's minimal argument handling nevertheless has some common idioms. One of those is that $self is the first argument to all instance methods. Is this an instance method? Sort of. It's also a wrapper around an instance method which may or may not need to redelegate to the wrapped method in various ways. Thus this wrapper needs access to the wrapped method somehow.

The Moose developers chose to provide the wrapped method as a code reference passed as the first argument to the wrapper, mostly because this is the least worst approach [ 1 ]. You can't really add syntax to Perl 5 within the scope of this wrapper even with Devel::Declare without rewriting the op tree of the method itself to get at the wrapped method somehow, because Perl 5's internals know nothing about this technique. Besides that, D::D is a heavy hammer to pull out.

(Again, sometimes adding syntax to Perl itself makes otherwise intractable problems simple.)

Passing this parameter at the end of @_ would lead to the even more awkward idiom of:

around 'summary' => sub
{
    my $orig = pop;
    my $self = shift;
    ...
};

The splice approach is the least worst way I've found to deal with the situation where I have variadic argument counts to differentiate between attribute access and attribute mutation.

One problem remains: this wrapper only matters for the mutation case. I'm starting to believe that variadic redispatch within user code is a code smell of the rotting vegetation variety.

If Perl 5 supported multiple dispatch even only on the number of arguments provided to a method, this code would not be a problem. The wrapper could apply only to the case where the method took one or more arguments and would leave the accessor case of zero arguments unmodified.

The most right solution in my case is to change the name of the mutator to set_summary()... except that that's only the obvious change in the local case. For the sake of consistency with the rest of the system, I ought to change every mutator's name to set_foo: a large change. It's worth doing, but I should have done it from the start, if I'd predicted that a strict separation between setting and getting were necessary to reduce the possibility of bugs in wrapped methods.

Consider also what happens if you don't take the splice approach but do follow the common redispatching idiom:

around summary => sub
{
    my ($orig, $self, @new_values) = @_;
    return $self->$orig() if @_ == 2;

    ...

    $self->$orig( @_ );
};

I made that mistake too. It was easy enough to spot after DBIx::Class gave me odd errors about not knowing how to store a coderef in a column that should only contain text.

Again, this isn't Moose's fault. It does the best it can with the features Perl 5 provides. Yet imagine if Perl 5 knew what an invocant was and understood something about wrapping. (You can even unify redispatching to an overridden parent method in this—and while you're at it, fix the SUPER bug.) The syntax might look like:

around 'summary' => method
{
    return $self->*around::original() unless @_;
    my $summary       = shift;

    return unless length $summary;

    my $content = $self->remove_markup( $summary );
    return '' unless $content;

    $self->&around::original( $content );
};

Perhaps the star selector syntax isn't the right syntax to identify a different dispatch mechanism than simple name-based lookup, but you get the idea.

Until then, I'll be looking for an easy way to rename all of my mutators.


[1] Least worst? Consider the alternatives. If Moose provided an internal mechanism to set attributes directly (and not through direct hash access, which is prima facie the worst possible access mechanism), it would likely need access control to restrict direct mutation to declaring classes and their immediate descendents or applied roles. Figure out that system. Now strengthen it against everyone who suddenly discovers that a quick hack here and there will speed up synthetic benchmarks by a couple of percent and break the carefully constructed encapsulation, and ... well, welcome to Perl 5. (back)

Show It Off

Remember Ruby on Rails? Six years ago it was a bit of code a couple of people the nascent Ruby community had heard about. Five and a half years ago, it was a screencast. Then the really good Java programmers realized that maybe programmers don't have to write XML all day to get things done.

Don't discount the possibility of bragging a little.

I've built a few of my own projects with Perl. Some of them you've seen. Some of them you haven't.

Why do people get excited when they hear that JT Smith and crew built The Lacuna Expanse with Perl, or that Gabriel Weinberg runs Duck Duck Go with Perl? I think we like seeing technologies we enjoy using and building and recommending used productively and effectively.

(Perl touches almost everything in my business, Onyx Neon Publishing, from managing financial matters to editing and formatting to running our website.)

Why not brag a little bit more? If you've built something—especially if that something brings in money or prestige or whatever reward—show it off.

DBIx::Class is the first ORM I've found which provides more benefits than headaches. It's not perfect—it has a learning curve—but its power and flexibility have simplified several of my projects.

One of my few ambivalences toward DBIC is that it commingles database operations (create, retrieve, update, and destroy) with model-specific operations ("level up this character", "sell the magic sword", "deactivate the smoke trap"). This is useful in that my objects have smart persistence and updating and relationship tracking provided by their DBICness, but it's difficult in that sometimes I want a stricter separation of database concerns from data model concerns.

Suppose I want to test that a document processing model can successfully filter out many hundreds of thousands of algorithmically generated permutations of unwanted data. Suppose, as is the case, that these documents are DBIC objects normally retrieved from a database. A test database might seem like just the thing for data-driven testing, but in this case it seems more work to generate a large transient database full of hundreds of thousands of documents to satisfy the constraints of DBIC.

In other words, I want a way to generate a new model object programmatically without having to store it in and retrieve it from a database. (Alex Hartmaier and Matt S. Trout both reminded me of the $rs->new_result() method which creates an object given the appropriate resultset but does not store it in the database.)

I do have tests which verify that the storage concerns behave as appropriate. They represent a small investment in mock data which sufficiently exercises all of the cases my code needs to handle. My other test concerns have little to do with the database itself. They care about what the model objects do with their data, not how they get that data.

Thank goodness DBIC is compatible with Moose.

I extracted all of the non-database model methods into a role. That role requires the accessors the database model provides for persistent data. I created a very simple dummy class which has those necessary attributes and performs that role. Then I wrote a very, very simple loader function which generates the necessary data algorithmically, instantiates instances of the dummy class, and tests the role's methods.

(I plan to write a longer article for Perl.com showing example code.)

The result is a model class which consists solely of the code generated from DBIx::Class::Schema::Loader and a with statement to apply a couple of roles. The tests are in two parts: one tests the model-specific code. The other tests the persistence-specific code, itself a combination of the generated code and another role which collects the remaining persistence behavior.

Even though both roles have an obvious coupling in terms of providing necessary behavior to the model (and to each other), decoupling them in terms of storage provides much improved testability—and, I suspect, more opportunities for reuse and genericity throughout the system.

While explanations of the value of Perl roles often focus on reusability and the separation of similar concerns from otherwise unrelated classes, roles can also provide a separation between dissimilar behaviors and concerns. In this case, it doesn't matter to the role methods where the data comes from (a live database, a testing database, an algorithm, hard-coded in a test case, the command line, wherever). It only matters that that data is there.

This technique doesn't always work. This technique isn't always appropriate. This technique does not replace verification that your behavior roles interoperate with your persistence models appropriately. Even so, it has simplified a lot of my code and improved my tests greatly.

(or, You Could Have Invented Dependency Injection)

I've been refactoring an existing system from a prototype to a well-designed and maintainable system. Part of that process is finding a better design. Part of that process is improving test coverage to find and fix bugs.

Something fascinating happens during that process if you pay attention.

Because the design of the project is fluid at this point, I have free reign to redesign to make testing easier. Because test coverage is spotty in places, redesigning to make testing easier is almost a necessity. Fortunately, the project is small enough and communication with the stakeholders easy enough that determining what the code should do where it's not clear is easy.

Plenty of people criticize test-driven development as if it were an iron-clad rule of law, where every line of code must have at least one associated test case, and if you do that, you get a badge and some grape flavored juice to hold you over until the spaceship comes. If that were the only benefit, that would indeed be a fair criticism—but how do you test this code?

sub save_to
{
    my ($self, $dirpath) = @_;

    my $filepath = File::Spec->catfile( $dirpath, $self->cached_file );

    my $i = Imager->new( file => $self->full_file );
    return $self->invalidate() unless $i;

    my $width  = $i->getwidth;
    my $height = $i->getheight;

    # verify size
    return $self->invalidate()
           if MyCoolApp::Exclude->image_size( $width, $height );

    my @resize = $self->should_resize( $width, $height );

    try
    {
        $i = $i->scale( @resize ) if @resize;
        $i->write( file => $filepath );
        unlink $self->full_file;
        $self->state( 'FETCHED' );
        $self->full_file( $filepath );
    }
    catch { warn $_; $self->invalidate };

    return 1;
}

This code will not be trivial to test. Perl 5 has a few idioms for testing the existence of files in the filesystem. With some careful use of File::Temp, it's possible to verify that a file which did not exist before the call being tested exists after the call, but the rest of this code has other difficulties.

The hardest part of this code to test comes from the hard-coded call to the Imager constructor. I could hijack the loading of Imager with Test::MockObject::Extends and test all of the paths through this method by changing the mock Imager's behavior, but that would couple of the details of the test too tightly to the details of the code being tested.

This method's biggest problem is that it does too much. It creates an Imager object. It tests to see if the image represented should be excluded from saving. It resizes the image. Then, finally, it saves the image. All of this behavior made sense when the only operation performed against this API was "save this image, if applicable" but these operations are very obviously distinct operations confused in one spot when trying to test this API.

This is where the people who dismiss TDD as a only brain-numbing checklist of rote behavior are wrong. I'd never let myself write a method with this many separate concerns if I had written it with TDD, but I can admit that I'd write code with this many separate concerns if I'd written prototype, just get it working, see of it's possible code. (I did write the exclusion code.)

Years ago I would have forged ahead to dummy up an entire Imager, just to get full test coverage. Fortunately, this code uses Moose.

Moose?

The design patterns people have a term called Inversion of Control and a specific grouping of techniques in a family called Dependency Injection. You can get lost in the architecture astronaut nature of the discussion, but the basic principle is sound: avoid hard-coding dependencies in your code.

In other words, the most suspect part of this method is the line:

    my $i = Imager->new( file => $self->full_file );

... because of the very tight coupling between this module and Imager.

You may think now "Wait, but you need the behavior of Imager for the code to work correctly!" and you're correct to do so. That's absolutely right. This code needs the behavior of the Imager object, and this code is tied to the API of the Imager object, but this code itself does not have to be responsible for instantiating the Imager object.

If you already knew that, give yourself a gold star. If you're now thinking "Yeah, but why does that matter?" or "What?", read the previous paragraph again. It's subtle, but it's important.

Moose supports something I'll call synthetic attributes. They're attributes of your object built from the values of other attributes. You're likely to encounter them when they're built lazily. Consider if I added an imager attribute to this class:

has 'imager', is => 'ro', lazy_build => 1;

sub _build_imager
{
    my $self = shift;
    return Imager->new( file => $self->full_file );
}

That performs one very nice feature of decoupling the creation of the Imager object from the method I want to test. If that's the only value of the new synthetic attribute, it's still useful. Yet that attribute has greater serendipities:

sub test_save_to
{
    my $module = shift;
    my $image  = $module->new(
        filename => 'my_file.gif',
        state    => 'FETCH',
        validity => 10,
        imager   => undef,
        validity => 0
    ) );
    my $result = $image->save_to( 'some_dir' );

    ok ! $result, 'save_to() should return false without valid imager';
    ok ! $image->is_valid, '... invalidating image';
}

Extracting this hard-coded call into a synthetic attribute with Moose allows code which creates these image objects to provide their own values for the imager synthetic attribute. The test is now in control over what's provided. It can use a real Imager object or a mocked Imager object. It can allow the lazy builder to create a real object.

One small act of extraction (and, as usual, intelligent default behavior in Moose) has turned a method which is difficult to test into something much more tractable. Testing even the hard-coded call itself was possible in Perl 5, especially with the utilities of CPAN modules which let you scrounge around in namespaces, but little bit of care and a little bit of abstraction and decoupling and encapsulation make the code and the tests more robust.

(For full disclosure: I had a fleeting thought at first "With imager() as an accessor, I can subclass the class I want to test locally, override that method, and return what I want. That's not strict dependency injection by any means, but I can explain it in terms of DI as a metaphor." A moment later I realized how easily Moose supports DI if you structure your code in this way and was glad I don't have to overdesign the tests.)

This is the sort of writing I didn't have time to get into in detail in Modern Perl: the book, but is well within the scope of what the potential authors of the Moose book want to cover and very much in the spirit of what a new Perl Testing book will discuss. Hint, hint.

Hurry! The free Modern Perl ebook giveaway only lasts until the Singularity!

Given the imminence of Perl 5.14, would you like to see an updated version of Modern Perl for the new stable release? The electronic version have a few bug and typo fixes not in the printed version, and this seems like as good a time as any to do a second printing.

(And, yes, the title gently mocks the breathless advertising of certain "traditional" publishers who seem to need to brag that they too once in a while give books away out of magnanimity. That portion of the publishing industry will continue to die as authors realize exactly how bad their royalty arrangements are. I personally earn more money from a purchase of Modern Perl made through an Amazon.com affiliate link than I would have earned in royalties from any of my other publishers. You'd think finding a way to compensate authors more equitably would be a priority to anyone who loves the printed word.)

2018 is the Year of Perl 5.10

| 9 Comments

The Perl 5 porters officially ended support for Perl 5.8 on November 5, 2008. Fortunately, Enterprise Support exists to help your legacy Perl 5 installations cope. Distributions such as Red Hat Enterprise Linux and its offshoot CentOS will continue supporting old versions of Perl 5 for up to ten years since their release (the release of the distribution, not the release of the version of Perl 5 they distribute).

For example, the most recent CentOS release, CentOS 5.6, includes Perl 5.8.8. (CentOS 5.6 came out just over a month ago. Perl 5.8.8 is seven stable releases out of date.)

In seven years, when CentOS 5.6RHEL 5 reaches the end of its supported life span, Perl 5.8.8 will have been unsupported by Perl 5 Porters for nine and a half years, and will be almost thirty five stable Perl 5 releases out of date.

But it's supported and it's enterprise, so the CPAN authors of 2018 must be sensitive to the needs of their users, and so the earliest you can rely on Perl 5.10 features such as say or smart match or defined-or or a non-recursive regular expression engine without easily crafted denial of service attacks or plugged memory leak bugs in closures and certain eval constructs is April 2018.

If Perl 5.16 gets a meta-object protocol and support for a stripped-down version of Moose as its default object system, CPAN authors can start using it in 2019.

If that doesn't make you want to fire up your favorite text editor and start making plans for all of the wonderful things you will eventually be able to do in the far future, consider also this. If the oldest supported version of Perl 5 is 5.8.8 (even if the people who wrote it and maintain it and support it have disclaimed any interest or desire or intent to support it), then every subsequent version of Perl 5 is a supported version of Perl 5. With Perl 5 on a well-tuned yearly release schedule, you can expect a new major release once a year and three or four point releases in that major release family every year. That is to say, Perl 5.12.0, 5.12.1, 5.12.2, and 5.12.3 are supported. Perl 5.16.0 through Perl 5.16.3 will be supported.

Because you as a CPAN author can't leave all of the people paying good money to CentOS to support Perl 5.8.8 for the next seven years out in the cold, you have a lot of free time not using new features or removing workarounds for Perl 5 bugs, and you can use that time to test your code on an ever-increasing number of supported releases of Perl 5 in the intervening years.

Fortunately, we have great tools such as App::perlbrew which can install multiple parallel versions of Perl 5 without conflicting with the system Perl 5 installation, so that you can install Perl 5.8.9, Perl 5.10.0, Perl 5.10.1, Perl 5.12.0, Perl 5.12.1, Perl 5.12.2, Perl 5.12.3, and (soon) Perl 5.14.0 on your CentOS 5.6 or RHEL 5.x system to test that your code will continue to run on every supported version of Perl 5.

(I know, I fib just a little bit. There's no supported RPM of perlbrew available for CentOS or RHEL. What can I say? I'm an idealist. At least it's only seven short years before the Perl 5 world can rely on Enterprise Distribution users being able to use tools such as perlbrew.)

This is still a big job, so the Perl 5 world needs even more people to donate their time and skills and effort to making sure that code written and donated by other volunteers continues to meet the needs of people paying third parties for the privilege of installing RPMs that are guaranteed never to change for seven to ten years.

Fortunately, this is an easy process to automate with Perl 5. I have some proof of concept code which will test a distribution against all supported versions of Perl 5. Sure, it takes quite a while for each test run, but computers will be faster in seven to ten years when I can release it. (I'd release it now, but I was lazy and used named captures, a Perl 5.10 feature. Won't it be nice when we can use those? Sorry my code is unusable by everyone else.)

I get shivers just thinking about how wonderful 2018 will be, the year of Perl 5.10.

The Little Book of Plack

| 18 Comments
I don't use CGI.pm any more. (Though I did recently write a CGI script. I used Plack for it. I ♥ Plack.)

Aristotle Pagaltzis

As usual, Aristotle is right. Plack is an obvious improvement to deploying Perl web applications and a very powerful way to manage middleware concerns. I use it with multiple projects.

I keep toying with the idea of producing a little book on Plack. It can be a short introduction to what Plack is (and why Plack is), a guide to some of the deployment options (such as when to use the various backends), a set of examples of the best middleware which exists today, and a guide to writing your own middleware or server backend.

It should be short. It can be succinct. It doesn't have to be a three hundred page, $35 printed book. It could be a hundred pages long, electronic format, and available for less than $10.

Does this interest anyone besides me?

(Other books are coming too, oh yes, probably including an update to Modern Perl for Perl 5.14.)

Pains of the Past, Begone!

| 1 Comment

Still not convinced that Moose cuts the Gordian knot that tied the Perl 5 world up in spaghetti knots for so many years? Review some of the discussions of Perl 5 OO in the pre-Moose world, especially after Apocalypse 12 came out and shone a light on the path out of darkness.

I amused myself far too much by reading A Class:: module I don't want to write and Can I please have *simple* modules?, for example.

My rule is simple: Moose has made me forget many, many little problems that I used to have in the same way that Perl has made me forget many, many little problems that I used to have. (I'd even forgotten I'd written an article called "Seven Sins of Perl OO" five years ago. With Moose, only one of those sins is still a problem.)

The quality and cost of your language's abstractions can affect the applicability of those abstractions. Consider Parrot and its intrinsic data structure, the PMC. The design of PMCs resembles Perl 5 SVs in a philosophical sense, if not a direct working relationship. The guiding design principle is "A PMC should be able to support a basic set of core operations common to all core PMCs".

Parrot's core PMCs have dozens of these operations, such as getting and setting integer, numeric, and string values. They support a clone operation, an invocation operation, an instantiation operation, and more. While some of these operations make little sense (invoking an Integer PMC throws an exception for obvious reasons), every PMC has to support them sufficiently that attempting to perform that operation should not cause the VM to crash.

Parrot's core PMCs are big wads of C code. All of these operations in Parrot's core PMCs are C functions gathered in a large table of function pointers. Every PMC type has one of these tables. Adding a new core operation to Parrot means adding a new function to every PMC table. Every new function added to the PMC table thus adds to the memory and startup costs of Parrot overall. Similarly, adding a new PMC type—even a specialization of an existing type—increases the memory and startup costs of Parrot overall because a new type needs an entirely new and distinct function pointer table.

These are implementation details quite solveable with some clever programming, but the most interesting point in my mind is how these implementation details have influenced the development of Parrot and Parrot languages. In particular, there's a subtle (and likely correct) belief in the project that adding a new PMC is an expensive operation in terms of Parrot's footprint and the management of its source code.

That's true to some extent, which is a shame, because Parrot's suitability as a VM for interoperable languages comes in no small part from the flexibility of the PMC system.

The cost of the belief in the heavy cost of new PMCs is that the core of Parrot takes advantage of far less polymorphism than it should. Consider the difference between a named function, an anonymous function, a method, a closure, a continuation, an exception handler, a binding to a shared library function, and a lazy thunk (perhaps a generator). You can invoke all of those items. You can pass in zero or more arguments. They can each return zero or more values. Yet they each perform very different mechanisms of control flow.

An ideal Parrot design would have separate PMC types for each logical type. A boring old named function which takes no parameters could avoid parameter passing code altogether, while the binding to a shared library function likely needs to marshall data to and from the appropriate ABI. A generator should be fast and lightweight, while an exception handler may have to manipulate control flow in interesting ways to resume an exception or at least resume at the appropriate place in control flow based on which exceptions it can handle and which it must decline.

Yet Parrot doesn't have multiple specialized PMC types representing all (or even most) of the necessary variants of invokable PMCs. It only has a handful. Those precious few would benefit from the judicial and pervasive application of the pattern Replace Conditional With Polymorphism.

Unfortunately, PMCs are far more expensive than they should be, and so it's cheaper to make the existing PMCs far more complex (and subsequently far more expensive on the individual level, not only the type level) than they should be. If PMC types were cheap, individual PMCs could be smaller and smarter and the entire system itself could have much stronger pressure toward simplicity.

Then again, writing an object system in C using native C data structures and dispatch mechanisms and language patterns governs what you can and can't accomplish cheaply and easily. One of the goals of the Parrot Lorito project is to reduce the tight coupling of Parrot's internals with C so as to make necessary changes in Parrot sufficiently inexpensive.

(Any metaobject system in, for example, Perl 5 would do well to consider these lessons.)

What is the purpose of object orientation?

If you, like me, first encountered the term in the popular working programmer literature of the '90s (or if you've read Microserfs), you might have heard that the purpose of OO is to enable the deliberate reuse of unique and well-defined software components.

If you learned structured programming, whether earlier in the history of programming language development or because objects were a fad until Sun's marketing budget displaced IBM's marketing budget, you might have heard that the purpose of functions is to enable the deliberate reuse of unique and well-defined software components.

I write this knowing full well that in a few minutes I need to modify a piece of code which uses multiple CPAN modules, many of them defined, tested, deployed, and loved very well. With that in mind, I still believe that reuse as a primary design principle is often a fool's goal.

Last week's example of abstraction versus mock objects elicited thoughtful comments from Zbigniew Lukasiak and Andreas Mock relating to the single responsibility principle, design patterns, and dependency injection. These concerns are important in the sense that contrast and shading and hue are important concerns in creating great artwork.

Yet somehow the false notion that design is primarily an exercise in identifying, codifying, and extracting reusable components is pervasive, especially in literature intended for novices. (Perhaps novices sigh with relief when they finally get a single function or class working correctly, thinking to themselves "Thank goodness I shall never do this again!")

In 1999 I might not have created a helper class to solve my testing problem from last week. (Why multiply entities unnecessarily? I'm not writing ravioli code here!) I might have struggled through mocking DBIC row objects and priding myself on the thoroughness of my testing of all of the picky little details.

In 2011 I congratulate myself more on the code I don't have to write—not because I'm reusing implementation details but because I've minimized the details any particular piece of code has to understand.

That's why the Modern Perl book explains functions primarily as a way of naming and encapsulating discrete units of behavior and objects as discrete identities with single, well-defined responsibilities.

Reuse is well and good when it happens, but reuse happens best because of sensible design. It's not the primary goal of software design. Nor should it be. It's a serendipity born of the fundamental design concerns of naming and encapsulation.

Let us teach these useful truths to novices instead of the airy lies they will waste their time pursuing.

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide

Categories

Pages

About this Archive

This page is an archive of entries from May 2011 listed from newest to oldest.

April 2011 is the previous archive.

June 2011 is the next archive.

Find recent content on the main index or look in the archives to find all content.


Powered by the Perl programming language

what is programming?