Reduce Complexity, Prevent Bugs

| 1 Comment

I spend a lot of time thinking about how to prevent bugs in Parrot. My first contribution to the project was a patch in late 2001 to make an essential Perl 5 program used in the build compatible with Perl 5.004. (My, how times change.) I've spent countless hours in the intervening seven and a half years helping the project become correct, complete, viable, and competitive.

Many of my opinions about the maintainability and sustainability of software projects come from experiences with Parrot (sometimes to the chagrin of people who don't know the other projects I can't talk about which have similar characteristics).

Fiddly Bits of Parrot Not Always Easy to Write Correctly

Parrot uses pervasively a data structure called a PMC,a PolyMorphic Container (or Parrot Magic Cookie). A PMC represents anything that's not a primitive value -- anything more complex than an integer, a floating point value, or a string. In Perl 5 terms, a PMC resembles an SV. Don't take that line of thinking too far; PMCs take the good parts of SVs and avoid the scary, complex parts of SVs.

Because Parrot hasn't quite managed to get rid of C entirely yet (see the Lorito plan for more about that), we have several dozen core PMCs written in C.

A PMC has several well-defined behaviors which forms the vtable interface. These are common operations that any PMC should be able to perform: get a scalar value, set an integer value, access a nested PMC, invoke the PMC as a callable function. Not every PMC performs every defined vtable function, but unimplemented functions produce Parrot exceptions rather than interpreter crashes.

Additionally, most PMCs have attributes. Think of a PMC as a class, with instances of that PMC as objects and PMC attributes as instance attributes and vtable functions as instance methods, and you have a conceptual understanding which works at a high level.

Because of our current use of C as the PMC declaration language, PMCs need to understand their memory management characteristics. In other words, if your PMC has two INTVAL attributes and one PMC attribute, the PMC initializer (like a constructor, in OO terms) needs to allocate enough memory to store these three attributes. Similarly, the PMC's garbage collection mark vtable function needs to be able to mark any PMC stored as an attribute as live. The PMC's destroy vtable function (a destructor, of sorts), needs to release the memory allocated for attribute storage back to the system.

(Don't you have a garbage collector?, you may ask. That's a good question. We could let the garbage collector manage the lifecycle of all of these pieces of memory, but they're already attached to GCable elements, so we don't need to mark or sweep or trace them. The malloc/free memory model works here well enough, even though we use memory pools to avoid the costs of malloc/free.)

Why Fiddly Bits are a Problem

Thus to write a PMC without any garbage collection errors, without any memory leaks, and without any random corruption waiting to happen, you had to remember several steps. In practice, people writing their own custom PMCs copied and pasted behavior from an existing PMC, then refactored it until it did what they wanted.

I spent a couple of weeks reading every line of every core PMC in Parrot. I fixed a lot of bugs. I can spot GC and memory bugs in patches. The problem is that I don't scale and you can't get the experience I have without going through all of the bugs I've gone through -- and if I never read your patch, you may still have that bug.

Properly Encapsulated Complexity

Julian Albo and Andrew Whitworth (and several other Parrot developers) made an improvement recently in this area.

PMCs with attributes need to declare them. We use a mini-language built around C to define PMCs. For example, the PMC which represents an object in Parrot (the Class PMC) has two attributes, a PMC which represents the class of the object and a PMC which contains the instance variables of the object. The code looks like:

pmclass Object need_ext {
    ATTR PMC *_class;
    ATTR PMC *attrib_store;

    /* vtable entries go here */

    /* PMC methods go here */

The PMC to C conversion step creates a C struct to hold this PMC attribute data:

/* Object PMC's underlying struct. */
typedef struct Parrot_Object_attributes {
    PMC * _class;
    PMC * attrib_store;
} Parrot_Object_attributes;

Thus at Parrot's compilation time -- when we compile the Parrot virtual machine -- we know how much memory to store the attributes of each PMC. We know which PMCs have attributes (not all do). We know which PMCs need to mark their attributes specially (this one does, as its attributes are GCables and not primitive values).

Julian's idea was to store the size of the attribute structure in the PMC structure. When allocating a new PMC, the PMC initialization code also allocates memory to contain the PMC's attributes and attaches it. Thus all of the bookkeeping code in PMC init vtable functions can go away. When destroying an unsed PMC, the PMC destruction code can free this memory. Thus all of the bookkeeping code in PMC destroy vtable functions can go away.

We can even get rid of a special PMC flag value which meant something to the garbage collector but was fiddly to get right, because people often forgot to enable it.

This new code is obvious to prove correct. It either works or it doesn't. It's one codepath to examine and patch, not dozens of core PMCs and countless other PMCs existing now or in the future. This reduces the amount of code people need to write and reduces the amount of code existing in our system.

We've moved the internal bookkeeping mechanism from the user-visible portions of Parrot. If you want to hack on the GC, feel free -- but most people shouldn't have to. They shouldn't even have to know how it does what it does. (That won't hurt, but they shouldn't have to know the mechanisms by which it does what it does.)

That's one principle of software development I always encourage. Encapsulate confusing or dangerous or difficult code behind a nice interface. Now you don't have to worry about doing the wrong thing because you don't know how to write code which does the wrong thing. If you don't write any code at all, Parrot will do the right thing for you.

Yes, we changed the way you define PMCs -- but tell me that this isn't an improvement for everyone. That's a principle of modern Perl I want to encourage.

1 Comment

"PMCs take the good parts of SVs and avoid the scary, complex parts of SVs."

Given you know enough about SVs to know they're scary and complex, maybe you could make some notes for the perl5 docs or on the corehackers wiki or ... somewhere, anywhere ... that explains the scaryness and complexity a bit better for those of us who don't yet? You could even take the opportunity to squeeze in a few notes about why you think PMCs are better, and maybe perl5's guts can learn a few things from parrot's research work just like the userland is learning a few things from perl6's ...

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide

Categories

Pages

About this Entry

This page contains a single entry by chromatic published on August 3, 2009 4:54 PM.

The "Dependencies Yay/Boo" Debate is Subtly Wrong was the previous entry in this blog.

A Modern Perl Success Story for the Internet CLI is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.


Powered by the Perl programming language

what is programming?