September 2009 Archives

Today I helped a lurker in #parrot find a productive task.

He's kept an eye on Parrot for several years, but today he asked if there was anything he could do to help. I walked him through getting the source code and suggested an introductory task of an hour or so. At least two other developers gave him other potential tasks.

Any community-driven project of reasonable size has several small tasks suitable for a novice. We're fortunate in Parrot that we have plenty of opportunities for people who know Perl 5 or C or Perl 6 or virtual machines or want to learn any of them -- as well as technologies like parsers, opcode dispatch, garbage collection, JIT, installers, and more. We add a new committer every couple of months. We take mentoring very seriously.

As nice as this is for the Parrot project, it's a small part of the entire Perl ecosystem. (Some committers have little interest in the Perl ecosystem itself -- they care more for Parrot as a free software VM unbeholden to corporate interests or a practical demonstration of new ideas in compiler technology or the promise of radical language interoperability or powerful tools for developing little and large languages or an excuse to use what they learned in computer science classes or....). If there are a million people who've written one or more lines of Perl code in the past ten years, maybe a thousand of them have sufficient interest in Parrot and related projects that they may one day contribute a patch or a bug report or a FAQ.

That leaves a million or so other people as potential contributors.

I can think of plenty of opportunities for them to help the Perl ecosystem with an afternoon of time:

  • Add a test to a favorite CPAN module
  • Triage bugs in a CPAN module or other Perl project
  • Test DarkPAN code against bleadperl
  • Add to the Perl 5 Wiki
  • Review the Perl 6 synopsis
  • Join/start/lead/speak at a Perl Mongers group
  • Mentor a novice developer
  • Teach a Perl class or lead a study group at a local community college

There are many more possibilities. The question "What is there to do?" doesn't interest me, at least in comparison to a deeper question.

How do we connect those million developers with these possibilities?

Failure-Driven Design

| 1 Comment

I use actual, measured needs to help design software. I also use reported failures to help design software. I have one more design principle: I use misunderstandings to help me design software.

In particular, this is an API-level design principle. The question changes from "How can I make a particular feature possible?" to "How can I make a particular feature impossible to misuse?"

Up-front Failure Prevention

It's easy to demonstrate failures of this principle; consider string handling in the C programming language, or global-by-default variables in Perl 5, or the Python REPL's behavior when you type quit or exit. (That last example catches me every time I use Python's interactive mode. Thanks for the lovely warning. Please do what I want if you know what I want.)

This principle also falls somewhere in between necessity-driven design and bug-driven design. It requires asking "What could possibly go wrong?", enumerating the likely and unlikely possibilities, analyzing their risk, and determining the likelihood of failure.

For example, Parrot supports a data structure known as a constant string. These are immutable singleton structures which represent strings used pervasively throughout the core system. By making them immutable, we obviate the need to make copies to prevent unwanted modifications. By making them singletons, we can collapse multiple references to a single string into pointers to that singleton string and save lots of memory.

We use a macro called CONST_STRING in C code in the Parrot core to identify one of these strings.

While it would be nice if our documentation were always sufficient to describe how to write a Parrot extension without copying and pasting code from the core, I realize that almost everyone who will ever write a Parrot extension will start with skeleton code cribbed from elsewhere.

I wanted to make the constant string technique work reasonably well for extensions as well. It'll never be quite as fast nor efficient as the core version, but a quick cache does help a lot of benchmarks.

Our first attempt used a different macro, CONST_STRING_GEN, as the internal implementation of that macro had to be different. Rather than poking directly into interpreter memory, extensions have to go through a secondary lookup: they don't have access to internals in the same way that Parrot's core does.

Then I realized the problem.

I don't want to explain the mechanics of how constants string work, at least not to people writing extensions. I want to say nothing more than "If you know you'll never need to modify this string, mark it as a constant string." I know I don't want to explain the differences between the caching models, especially because extensions shouldn't need to know anything about Parrot's unencapsulated internals.

Yet I knew that people would copy code from the core into their extensions and then wonder why their versions Just Did Not Work.

I changed the extension processor to emit a different version of the CONST_STRING macro local to each extension which uses the appropriate public API to manipulate constant singleton strings. Even though the mechanics of how this works differs between core code and extension code, it still reads the same way. People can copy and paste code between core data structures and extensions without knowing the difference, at least in this respect.

Even though copying and pasting is generally bad, it's so pervasive (especially in this context), our interfaces have to allow for it -- and should not allow people to make subtle errors.

For further reading, I suggest Joshua Bloch's How to design a good API and why it matters talk from OOPLSA 2006.

Reacting to Failure

Of course, it's not always possible to predict what people will do wrong.

Sometimes the best you can do is look at a bug report, ask yourself "Wait, why in the world would you ever write code this way?", and then work backwards. How did that failure occur? Do you not provide the right APIs? Do your abstractions leak? Are people working around a broken feature? Are people working around the lack of a feature?

Sometimes you need to pave the cowpaths. Sometimes you need to change the way you explain code. Sometimes you need to change the name of an API call or a parameter to suggest the right behavior.

Mostly you need to understand where and why expectations went wrong. Only then can you change the vocabulary of the system to change expectations for the better.

That's why I welcome quick, rapid feedback. If something doesn't work, I want to know that as soon as possible -- before it gets too established to improve, before it confuses too many people, and before people accept it as "just another quirk."

There can be a little pain to start, especially for early adopters, but there's no substitute for solving real problems in the real world to help you understand exactly what you should have designed in the first place. Maybe next time you'll get more right.

(This, I believe, is one of those practices which separates real, actual agile development from Big-A-Because-It's-Hip-Agile development: pervasive, ubiquitous feedback gathered and reflected upon to produce small, verifiable changes to development practices designed to improve the process itself.)

Genericity versus Optimization

I spend a lot of time profiling and optimizing Parrot and Rakudo. Even though we have added several features to Parrot since the 1.0 release in March and the 1.4 release in July, we've improved speed and memory use by measurable amounts. We haven't yet reached the point of powerful, orders-of-magnitude optimizations, but we're preparing for them.

I've written before that I believe the single optimization principle for writing a compiler and runtime for a dynamic language is to replace as many aspects of genericity and flexibility with straight-line code as possible. One optimization from yesterday demonstrates this in a dramatic fashion.

The oofib.pir benchmark exercises method calls and argument passing. It also creates garbage collectable objects. It's a little bit heavy on math operations for a general purpose benchmark, but I like that it performs frequent invocations. Optimizing those helps almost every real program, as does improving the garbage collector.

I normally perform profiling work with Callgrind. I've not found a tool more useful to count raw instruction counts in the processor. Yesterday I tried Cachegrind instead. Both tools are usable with the KCachegrind visualizer, making them more useful. Where Callgrind measures the instructions actually executed, Cachegrind measures the processor cache behavior of the program.

As modern processors are so much faster than memory (even fast processor caches), sometimes it's worth trading a few cycles for fewer cache misses. Similarly, as modern processors can have deep pipelines and perform speculative execution, rewriting code to avoid branch mispredictions can have a positive benefit on performance as well.

Branch prediction is a processor feature which analyzes a branch condition -- an if condition, for example -- and predicts where execution will go. Speculative execution performs the operations of that expected branch even before the processor knows that its prediction is correct. If it's guessed correctly, there's no penalty and a big improvement over having to wait. If it's guessed incorrectly, it has to throw away work. It's a risk, but it usually guesses correctly.

Sometimes it needs help.

Parrot's default garbage collector is an optimized-but-clunky stop-the-world mark and sweep collector. We wanted to replace it with a concurrent collector with better throughput before Parrot 1.0, but the time and expertise necessary to refactor the internals to make it possible to change the GC without breaking the code for everyone else was a larger investment than we could produce in time for the 1.0 release. (We're in much better shape now.)

Our garbage collector tracks two similar-but-different types of objects: STRINGs and PMCs. Think of a STRING as a chunk of binary data with an encoding and a length and a buffer. Think of a PMC as an object. A STRING is a single, self-contained unit. A PMC may contain attributes which refer to other PMCs. There are finer distinctions between them, but that's all you need to understand now.

To simplify our GC, both STRING and PMC have a shared header called PObj. This structure contains the garbage collector flags: what type of PObj is it (STRING or PMC), is it live or free, does it have a custom mark behavior, does it have custom destruction behavior. Note that both STRINGs and PMCs share the first two flags, while only PMCs have the latter two.

A stop-the-world mark and sweep collector starts from a root set of objects that the runtime knows are still alive. It traverses that root set, marking everything as a live and recursing into any PObj reachable from that root set, recursing into any PObj reachable from that set, and so on, until it's marked the entire graph of reachable objects as alive. Then it sweeps through the pools containing all allocated objects, freeing anything not marked as live and clearing all flags.

There was a single "mark this PObj as live" function. Whether we wanted to mark a STRING or a PMC, we called the same function, casting the object into a PObj.

The astute can pause here to say "Yes, of course, you're throwing away information there!"

Parrot r41447 added separate functions to mark PMCs and STRINGs as live to let the C compiler tell us if we made any mistakes about what we marked as live. (We've had a couple of bugs from expecting the wrong thing.)

My Cachegrind profile showed a huge amount of branch prediction misses in the PObj marking function. Specifically, the processor could never predict whether it was marking a STRING or a PMC. As you might expect, marking a STRING as live means only setting a flag, while marking a PMC requires checking if it has custom mark behavior and potentially recursing. There's no way to predict which path through the live object graph the mark phase will take, and there's no way to predict whether the branch predictor will see a run of PMC, PMC, PMC and get on a good train of prediction or whether it'll flip back and forth between PMC, STRING, PMC and continually be wrong. (The mark phase is deterministic for any code without random allocations, but there's too much data to predict any pattern.)

As we now had separate functions to mark each, I pulled the guts of the single mark function into the two helper functions... and reduced the number of branch mispredictions by over 70% in that benchmark.

For a further optimization, I realized that there's no need even to call the marking function for STRINGs, at least for code in the core of Parrot which can flip the flag directly.

The end result is a little bit more code -- not much, maybe a dozen lines -- but a huge increase in clarity, an improvement in simplicity, and better optimization and performance characteristics. The compiler now helps give us warning messages where it matters most (correctness), and we get better performance at a level normal profiling can't even see.

I even have the temptation to call this pattern "Replace Conditional with API."

Bug Driven Design

| 1 Comment

I wrote the other day about delaying design decisions until the last responsible moment, per the belief that only at that point do I have sufficient knowledge to design the feature to meet current needs. I can anticipate future needs, in the sense that I do not work to prevent them, but sufficient unto each release are the troubles thereof.

This principle rests upon the assumption that I have sufficient knowledge at each last responsible moment. Perhaps I've talked to the people who want the feature and have a detailed list of behaviors and expectations I can turn into tests. Perhaps I have a specification or RFC or test suite I must pass. Perhaps I'm writing the feature for myself and know exactly what I want.

Still bugs happen.

I'm not necessarily happy about this, but I do like the feedback. Not only are people using the software, but they care enough about it to tell me what they expect it should do. More important is that a good bug report gives me additional details about user expectations.

Would it be nice to have them before adding the feature? Of course! Would it be nice to have them before releasing the software? Undoubtedly. Would I like to avoid bugs in general? You know it!

Yet bugs happen.

Some bugs are bugs of understanding. I may write a lovely JIT for Parrot and receive effusive praise and a repurposed bowling trophy, but when someone says "That's 32-bit x86 only, and we'd really like it on our 64-bit processors," I know I have more work to do. That's good and that's valuable -- not only to give me a better understanding of what real users need, but to remind me to talk to people to discover what they really want and really need.

Some bugs are bugs of implementation. Perhaps I don't understand the pipeline and instruction scheduling system of a 64-bit processor and the resulting JIT code is unaligned and four times slower than it should be. As much as I'd like to avoid these problems, they happen. That's good and it's valuable, not just to improve the software for users, but also to help improve the test suite.

Furthermore, analyzing the causes of both types of bug can help me improve the process of creating software. Perhaps I'm not testing enough. Perhaps the characteristics of my sample workloads do not match real world uses. Perhaps the assumptions I've made about how people use the software need to change; perhaps their goals and values have changed.

I wish I could get this information reliably before I add a feature, but I'm a practical guy sometimes. Sometimes the best feedback you can expect is "Hey, it doesn't work!" That you can fix.

Necessity Driven Design

Matt Trout asked people to write about how they learned to design programs. Design is a skill largely untaught; I suspect other responses will suggest informal and inductive processes.

The Nascent Hacker and Design

My first encounter with programming was with personal computer BASIC in the early '80s. I wanted to play games, but the school system discouraged students from playing games during class time. Programming was fine. Thus, I taught myself to write games.

I returned to programming in the late '90s as a hobby. My main work was little challenge with plenty of spare time, so I could explore anything technical I wanted if I could justify it for business purposes.

I read and I experimented and I tried to answer the questions of other novices and I learned how to program.

My first few public free software projects went unused and unlamented. I worked on them to satisfy my desire to explore interesting technical challenges. They didn't meet real needs for me or anyone else. The problem with this approach is that it pursues novelty for the sake of novelty. The appearance of a better or newer novelty can undercut the motivation for the previous project.

In the decade since then, I believe I've learned a few things about how to design software. The most important lesson is build only what you need. Some people call this the YAGNI principle. That's effective for coding, but there's a different principle for design.

Necessity and Creativity

I believe that constraints improve creativity. I've participated in National Novel Writing Month, where the goal is to write 50,000 words of the first draft of a novel in 30 calendar days. The result is quantity, not necessarily quality, but the time constraint forces people to produce results.

It's taken almost two years to turn the results into a publishable novel, but the process worked for what it did.

Richard Gabriel's Worse is Better essay suggests similar results in software. A system that demonstrably meets the most important needs in an effective (if inelegant) way and works today is better than a system that theoretically meets all perceived needs in a breathtaking way but may not be available yet.

It's easy to gild the lily, to write software that does too much in an attempt to be all things to all people — but I believe that a constraint of delivery date or tight scope or available resources can help focus the project on its most important characteristics.

Experienced Hacker and Design

I've written about vision, especially in the context of Perl 5. I tend to let the vision for a project guide its design in the large. For example, my vision for Perl 5 is a language that scales from novices learning the language piecemeal to experts writing powerful, elegant programs with as few barriers and missteps and traps as possible. This suggests criteria to evaluate potential designs in terms of clarity, expressivity, and learnability — if not safety.

Another component of equal importance is necessity. Some of the earlier code in Parrot code is difficult to maintain due to (I believe) a premature focus on optimization. We've born the cost of maintaining (and testing and debugging and fixing and working around) several design decisions implemented because someone or other at some point thought that certain programs might run faster.

We've spent much time renovating, removing, and replacing some of these components.

We see a constant push and pull between high-level languages (the "customers" of Parrot, in one sense) and Parrot itself. Which features should Parrot support? What's the most elegant way to solve a problem in an HLL? Which solutions work better in Parrot? Which design decision will best suit multiple interoperable languages hosted in the same Parrot memory space?

I believe we can make better decisions now in part because we have actual need — HLL implementors with HLL implementations making their own design decisions based upon specifications and specification tests and input from language designers — driving the requests. I can profile a sample Rakudo application and find portions of Parrot that need optimization based on the codepaths real-world code executes. (That's why we can speed up Rakudo 2% here, 4.5% there, 6% there, and 31% there.)

Smart people in other fields sometimes call this the last responsible moment. That idea comes from manufacturing, but if anything it applies more to software, where everything but the laws of physics and monads are malleable, given time and will and design. Change happens. Allow it.

I believe designing to current, actual, perceivable, and measurable needs — if you understand those needs — produces better programs.

Some people complain that the motivation behind modern Perl is "to turn Perl into Java!" I don't understand that; any reasonably sized and well-written Perl program takes advantage of Perl idioms that the Java language cannot (and will likely never be able to) express.

They're right in one respect though: I'm happy to admit that like all mature poets, I steal good ideas wherever I can find them. Today's idea comes from Smalltalk.

A Digression on Context

The idea of context -- not just automatic data coersion between stringy and numeric and boolean and reference and (if you've disabled strict 'refs') executable values -- sets Perl apart from almost every other programming language. I explained that there are static and dynamic views of contexts in Tiny, Void Context Core Optimizations. These form the other axis of context-awareness: what do you expect to do with the result of an evaluated expression?

Expression context confuses novices. You can see it in the documentation of the return builtin and all best-practices discussions that suggest a bare return; over return undef;.

You also see it when people call functions in hash initializer lists:

my %big_bundle_of_data =
(
    name => $name,
    # oops!  don't do this
    id   => $q->param( 'search_id' ),
);

The problem is that the list used to initialize the hash enforces list context to the method call. If that method returns a list of items in list context, the hash may not contain what you expect.

Reflection and Introspection

One of Smalltalk's greatest strengths is the use of images. (That's also one of Smalltalk's greatest vulnerabilities; owning the world is good when you can own the world, but if something else starts the own the world, you should try to play nicely too.) You write Smalltalk programs in the Smalltalk browser (not a web browser, but an IDE) written in Smalltalk itself. Some Lisps do this too. You can consider Emacs a Lisp implementation which happens to run an editor to write more Lisp code, for example.

When you write Smalltalk this way, you edit parts of your program as it runs. The IDE is part of your program (or vice versa).

This offers tremendous power. If you want to list all of the objects or classes or methods in your program, you can do that. They're all available. If you want to perform exploratory programming, you can do that: ask the debugger to halt when it reaches a method you haven't yet implemented. Write a test, write a bit of code, run the test, see if you need that stub method.

There's a reason test-driven development came from the Smalltalk world.

Static versus Dynamic

The noisy discussion earlier this year about parsing Perl 5 was all about whether you can determine exactly what a Perl 5 program will do, parse-wise, without executing any of its code. In the Smalltalk model, you've already executed the code so you already know how it parses. You can even change that. (You can even rewrite Smalltalk's garbage collector in Smalltalk, if you want to do that. See also Parrot Lorito.

The B::Concise output in the discussion of my context patch demonstrated that the Perl 5 optree contains information about the context of expressions (in op form) before runtime begins. That information is available through the B modules while a program runs. You can find the context of any op by careful use of the B::* modules: pass a subref, get its ops, walk the COPs and look for the right file and line number, and follow the sibling pointers until you reach the right op. Then look at its flag.

This is all possible. It's tedious, but not difficult. Yet few people know it's possible -- and to my knowledge, no one has done it.

Yet imagine the power and the ease of learning and the potential for finding more error conditions if a powerful Perl IDE had the option of identifying the static context for any Perl expression. The code snippet I demonstrated earlier could pop up a warning window explaining exactly what might go wrong and suggesting the judicious use of scalar.

Perl::Critic can identify many of these cases. It's a great tool. Use it! Yet it can't identify all cases.

One of the important goals for Perl 6 is to make tooling like this possible. Separating the execution model from the tree model used to identify valid programs -- and making those trees available to tools -- will help. Gerard Goosen's TPF grant proposal for building ASTs for Perl 5 has similar possibilities. Hopefully it can get funding next time; it's a good project.

We may not get powerful Perl tools like this any time soon for Perl 5, but isn't it nice to imagine the possibilities? Maybe that'll make Perl 5 like Java, in the sense that Java has some powerful tools. Would that be such a bad thing?

Tiny, Void Context Core Optimizations

I'm writing about arrays in the Modern Perl book now. While writing about push and unshift yesterday, I looked in perlfunc to see if I'd missed anything subtle about push -- and I had:

Returns the number of elements in the array following the completed push.

I can't think of a time when I'd used this in the past decade. Every use of push I can think of is in void context:

push @some_array, qw( some list of items );

Curiosity convinced me to look at the bleadperl source code. The push op is in a file called pp.c in a function called pp_push:

PP(pp_push)
{
    dVAR; dSP; dMARK; dORIGMARK; dTARGET;
    register AV * const ary = MUTABLE_AV(*++MARK);
    const MAGIC * const mg  = SvTIED_mg((const SV *)ary, PERL_MAGIC_tied);

    if (mg) {
        *MARK-- = SvTIED_obj(MUTABLE_SV(ary), mg);
        PUSHMARK(MARK);
        PUTBACK;
        ENTER;
        call_method("PUSH",G_SCALAR|G_DISCARD);
        LEAVE;
        SPAGAIN;
        SP = ORIGMARK;
        if (GIMME_V != G_VOID) {
            PUSHi( AvFILL(ary) + 1 );
        }
    }
    else {
        PL_delaymagic = DM_DELAY;
        for (++MARK; MARK <= SP; MARK++) {
            SV * const sv = newSV(0);
            if (*MARK)
                sv_setsv(sv, *MARK);
            av_store(ary, AvFILLp(ary)+1, sv);
        }
        if (PL_delaymagic & DM_ARRAY)
            mg_set(MUTABLE_SV(ary));

        PL_delaymagic = 0;
        SP = ORIGMARK;
        PUSHi( AvFILL(ary) + 1 );
    }
    RETURN;
}

I know this is a big chunk of lots of macros, but it's not too difficult to understand. The first if branch handles the case where the array on which to push has magic -- if it's a tied array, for example. Ignore that. The second branch loops through every list item provided to the op and appends them to the array.

I've emboldened a line at the end of that branch. The PUSHi macro pushes an integer value (an IV, in core parlance) onto the stack. The AvFILL macro returns the index of the final element in the array. Adding one to that number gives the number of elements in the array.

Every execution of this branch retrieves that value and pushes it on the stack. Even if the opcode takes place in void context such that the compiler can determine that at compilation time, this push occurs.

I wrote a patch:


diff --git a/pp.c b/pp.c
index 9cedc3f..fbdc90c 100644
--- a/pp.c
+++ b/pp.c
@@ -4561,7 +4561,9 @@ PP(pp_push)
 
     PL_delaymagic = 0;
     SP = ORIGMARK;
-    PUSHi( AvFILLp(ary) + 1 );
+    if (GIMME_V != G_VOID) {
+        PUSHi( AvFILL(ary) + 1 );
+    }
     }
     RETURN;
 }

I've emboldened the important condition. The GIMME_V macro evaluates to the current context of the expression. Usually this context is statically determinable, but if this push is the final expression in a subroutine, the calling context matters. The G_VOID macro represents void context. In other words, don't push anything onto the stack to return a value from this expression unless something wants that return value.

Yitzchak Scott-Thoennes commented on my patch to say that GIMME_V may be more expensive than I intended. This is because looking up through calling scopes to find the runtime context is not always cheap. He suggested the simplification of:

OP_GIMME(PL_op, 0) != G_VOID

... to check only the compile-time context of the operator. You can see that this cheaper check is still correct in ambiguous cases:

$ perl -MO=Concise,check_push_context
sub check_push_context
{
    push @_, 'static void context';
    push @_, 'dynamic context';
}^D
d  <1> leavesub[1 ref] K/REFC,1 ->(end)
-     <@> lineseq KP ->d
1        <> nextstate(main 61 push_ctx.pl:6) v:%,*,&,$ ->2
6        <@> push[t3] vK/2 ->7
2           <0> pushmark s ->3
4           <1> rv2av[t2] lKRM/3 ->5
3              <#> gv[*_] s ->4
5           <$&;gt const[PV "static void context"] s ->6
7        <> nextstate(main 61 push_ctx.pl:7) v:%,*,&,$ ->8
c        <@> push[t6] sK/2 ->d
8           <0> pushmark s ->9
a           <1> rv2av[t5] lKRM/3 ->b
9              <#> gv[*_] s ->a
b           <$> const[PV "dynamic context"] s ->c

I've emboldened the lines representing the push opcodes and I've emphasized the relevant context flags for these ops. The first opcode has a flag of v, which indicates that it occurs in void context. The second opcode has a flag of s, which indicates scalar context. Thus Yitzchak's suggestion will work for both cases without ruining any dynamic context call of this function.

As is the case with such optimizations, the question is whether the cost of checking such an optimization possibility is worth the cost of doing the work anyway. Measuring that, however -- well, you're not getting huge speed improvements out of this code. For a one-line patch and a very common use of this op, it may be worthwhile.

Befriend a Novice

| 6 Comments

Around half of all attendees to a YAPC have never attended a YAPC before.

As of today, 7680 people have uploaded a distribution to the CPAN.

Several hundred people -- over a thousand -- have credits in the source code to Perl 5.

In any given yearly period, a few dozen people have contributed more than a single patch to Perl 5, Rakudo, or Parrot.

PerlMonks has a couple of hundred very active users.

It takes months of effort to get a dozen potential students for each year's Google Summer of Code for Perl.

A new edition of the book Programming Perl may sell hundreds of thousands of copies. A new edition of Learning Perl may sell tens of thousands of copies.

A controversial or active Perl blog most may get a dozen comments.

There may be a million people in the world who've wrote more than one line of Perl code for some purpose. Go find a novice, encourage him or her to continue learning Perl, and -- most importantly -- introduce him or her to the community. You don't have to file a bug on rt.perl.org or rt.cpan.org or fix the XS documentation in the Perl 5 core or help bootstrap the Parrot JIT or organize a conference or volunteer to mentor a TPF grant. Yet you'll get much, much more out of Perl if you add your little bit to the wider Perl community.

If we are to improve our community to welcome more participants (and we should do that), we should also actively recruit new participants. Don't wait for them to come to us. Let's go find them and invite them and make them feel welcome that way, too.

Aspects of a Novice-Friendly Distribution

| 2 Comments

Scott Walters suggested a Perl optimized for the novice user experience. Adam Kennedy called it Perl::Tiny.

If someone were to build such a thing, what would it be? How would it look? What features would it have and why?

I'd start with Adam's Chocolate Perl, a full distribution of Perl 5 which includes the toolchain necessary to build and install CPAN modules as well as supplementary tools and external modules such as Task::Kensho. In short, it's a modern, enlightened Perl 5 distribution.

Next, I'd add Padre. It's not the only Perl IDE, but it's under active development with responsive maintainers, frequent releases, and a plugin mechanism that allows the bundling and use of several other interesting pieces of infrastructure.

I'd use the Perl::Critic plugin and make it active by default. It'd take some work to choose the right set of default rules, but I'm sure we could come up with something modern. This will help novices avoid common problems which the Perl 5 core doesn't warn about.

I'd set a default template which enabled Modern::Perl by default. There's no sense in giving novices good tools without asking the compiler to give them as much assistance as possible.

I'd include Moose, of course. I'd include other CPAN modules such as Try::Tiny, though I wouldn't include all of perl5i. The goal here is to provide better alternatives to core syntax which is difficult to get right.

I'd provide a plugin which can explain Perl 5 magic variables by extracting the appropriate documentation from perldoc perlvar. I don't know if I'd recommend the English equivalents.

I'd also provide a plugin which can use Perl::Tidy and even B::Deparse to canonicalize code -- and explain syntactic constructs, keywords, and operators with the appropriate documentation from perlsyn, perlop, and perlfunc.

I'd love to see -- though this may be an additional project -- a graphical CPAN client similar to Ubuntu's Syntaptic, which can browse and search the CPAN indexes.

It might be worthwhile to add a plugin to search PerlMonks and other sites for help and questions.

I can imagine that there's a small bit of income available from selling training material -- especially katas and interactive lessons, but that's a different story altogether.

Would you have used such a system when you were a novice? Would you recommend it to others now? Am I missing things?

In the discussion of Perl Applications for Normal Users, the subject of installation came up again. User experience isn't always my primary concern, and I don't mind overgeneralizing that to other parts of the Perl 5 community.

For example, many Perl community websites are unattractive visually. There are exceptions. The London Perl Mongers site is very attractive. I also like the design of the Perl Moose site.

Yet a common refrain in many (more heat than light, sadly) discussions is that functional is more important than attractive. Yes, that's a false dichotomy.

I don't want to restart the debate over the attractiveness of Perl 5 websites. (I'd love to see variant templates that writers, developers, and projects could borrow and adapt for consistency of good design, but that's a different story.)

I want instead to talk about installation. This ties in to Who Benefits from the CPAN?. Consider, for example, the amount of work required to configure a CPAN client in 2000 versus the amount of work required to configure the same client in 2009. This has improved.

Yet compare that to installing a random PHP application.

Preemptive ad hominem: if at any time you feel the need to comment here or elsewhere saying "You fool! You foolish fool! You want to turn Perl into Java, you drooling maniac!", you are a masochist and your opinion does not matter.

One commenter on the previous post praised Ruby Gems for not failing tests and not spewing diagnostic output and asking confusing questions. Of course the thought of not running tests on installation horrifies me (how do you have any idea if your software works?), but the point is a good one. What information do users need to know to install your software? What questions do you really need to ask them?

We've optimized the Perl ecosystem for Perl developers. That's good. That helps us. That doesn't necessarily mean that what's good for Perl developers is good for Perl users. That also doesn't mean that we can only optimize the experience for one group at the expense of the other.

We can do better.

I argue often for better defaults -- not just at the language level but at every level users might see. Yes, I've overloaded the word "user". Deal with it.

One important question for me as I ponder visions for Perl 5 and its ecosystem is "How can we improve the default behavior for novices and for most uses by experienced programmers while allowing experienced programmers to customize behavior for advanced uses?" That's a language design principle. If you analyze the design of Perl 5, you'll see Larry's answers everywhere. The same goes for Perl 6.

I don't have complete, systemic answers for that yet. We have a lot to ponder. As food for thought, consider the (oblique) connection between two comments. The first comes from a critique of Python:

Also, its libraries require compilation. What a nonsense.

The second comes from Scott Walters, specifically his Language choice motivated by greed journal:

Moose is good and Moose is great, but the real win would be to automatically install modules, automatically call the OS's package manager to install libraries and other deps, to have created PHP before php did (or after) and have a libperl linking executable that outputs web pages (period), and most of all, to have a sandbox mode enabled by default (tired idea, I know) where users can write code without getting yelled at by humans, only by the computer, even if just because the code is labeled (all is fair if you predeclare) as "babytalk". Perhaps this would be a mix of strict, diagnostics/warnings, and a Perl::Critic policy that tries to help them in only the most immediate sense likely to be useful to novices in the short term.

Applications for Normal Users

| 10 Comments

I spend most of my programming time writing tools for other programmers. My business (Onyx Neon Press) has a modest amount of code to produce books, but a fair amount of the code in our production process is Pod::Simple and Pod::PseudoPod -- tools for other programmers.

Most of my work on Parrot and Perl 6 is infrastructure for other programmers to use too.

I suspect -- but can't prove sufficiently -- that many of the most popular and most widely used projects on the CPAN are likewise tools for programmers. Moose is a tool for programmers. So are Catalyst, Scalar::Util, Method::Signatures, Devel::NYTProf, and perl5i.

That's not a bad thing. I'm not complaining. It's an observation.

Perhaps CPAN is a tool primarily for developers. That's fine too.

Even so, I wonder where are all of the wonderful applications Perl programmers can mention when people outside the Perl community ask "What's it good for? What can it do for me?"

I'm happy to talk about Padre and BioPerl or Movable Type and Melody. Frozen Bubble is a good story (especially with SDL Perl under fresh development again -- see Kartik Thakore's journal for more SDL Perl details). dotReader didn't get much attention, but it's an impressive project. BBC iPlayer is just on the edge of projects to mention.

I'm not sure bragging about websites implemented in Perl 5 is useful or interesting. For the most part, that's immaterial. A web site is a web site is a web site.

Maybe Perl 5 doesn't need this. Java does pretty well with its comfortable niche in enterprisey web development, Eclipse (a programmer tool for writing programmer tools, remember), and LimeWire. Python gets a boost from the original Bittorrent code (Mercurial is a developer tool, as is embedded Python for game developers and 3D modeling programs). Ruby... well, there's Rails.

Sometimes I read Planet Gnome and Planet KDE and marvel at all of the nice projects that non-developer end-users can use. I see some of these applications developed in Vala and JavaScript (yes, a developer tool -- but one used to write a web browser, a mail client, a music player, et cetera) and even sometimes Python.

I know I've used Perl 5 for system administration, web development, workflow automation, games, visualization, and more. Maybe I'm a very unrepresentative user, and that's fine. Maybe I don't have the mindset to create software for non-developer users and that's probably fine too.

Even so, sometimes I wonder if the Perl community spends so much time obviously and obsessively focused on the needs of other developers that we might neglect some easy and obvious problems where Perl can be part of a great solution.

... not that such users will use Perl directly, but that expanding our area of influence may be a very good idea.

When SUPER Isn't

Damian Conway's Object Oriented Perl improved my programming more than any other book, when I first returned to programming. By the time I joined the Perl Renaissance, I'd learned more about how Perl 5 worked and how to think about approaching problems from Damian's book and my own experiments based on his writings.

Perl 5's default object system is deliberately minimal. It's the combination of two existing ideas introduced in Perl 5 (references and packages) with a method dispatch system. It's very clever in its minimalism; it enforces very little and almost never precludes people from using it to provide more powerful -- or merely different -- object systems.

It's not easy to make a minimal system that flexible.

Unfortunately, there are a couple of warts. It's common knowledge that the default Perl 5 object system is a little bit too minimal. Flexibility is good, but more people benefit from good defaults they only have to change when they're doing something special.

Most of the other problems I have with the basic Perl 5 object system come directly from its influence: Python. I laugh (yes, a sardonic laugh) every time a Python advocate says that Perl 5's object system is a bolted on hack, because Python has many of the same design problems. Yet I digress.

One design decision which Perl 5 stole wholeheartedly from Python is the idea that methods are just subroutines invoked as methods. Python's has first-class functions and allows you to install them in namespaces with simple assignment:

class Foo:
    def bar(self)
        print "I'm in bar!"

    baz = bar

def main():
    foo = Foo()
    foo.bar()
    foo.baz()

main()

This is not the place to debate the relative merits of Python versus Perl 5 syntax for doing so, but Perl 5 allows something very similar. You can import() methods into classes. The Why of Perl Roles explains why this is important.

For the most part it works. Unfortunately, when it doesn't work, it really doesn't work.

Super Fragile Explodealicious

Consider a silly example:

package Foo;

use Modern::Perl;

sub new { bless {}, shift }
sub foo { say shift . '->foo()' }

Now subclass it:

package Bar;

use base 'Foo';

You can successfully instantiate a Bar object and call foo() on it:

Bar->new()->foo();

What happens if Bar needs to get a method from a role? Here's a Baz package which (manually) imports a foo() method into the calling class. This method emits an informative message, then redispatches to the parent method:

package Baz;

use Modern::Perl;

sub import
{
    my $caller = caller();
    no strict 'refs';
    *{ $caller . '::foo' } = \&foo;
}

sub foo
{
    my $self = shift;
    say "foo() in Baz role!";
    $self->SUPER::foo( @_ );
}

Unfortunately, now you can't call the foo() method on Baz objects anymore:

foo() in Baz role!
Can't locate object method "foo" via package "Baz" at ... 

Insufficient Dynamicity

This error message makes little sense. The foo() method within the Baz package generates this error.

The problem is how the SUPER:: method selector works in Perl 5.

When the Perl 5 compiler encounters a function, it stores compile-time information in the internal data structure which represents functions. This CV, or Code Value, contains a pointer to the package to which the function belongs.

At runtime, the SUPER:: method redispatch looks at the package into which Perl 5 compiled the current method, then looks in its list of parent classes to figure out which method to call next.

You can see the problem. This behavior is, I believe, largely an artifact of a particular implementation -- likely the intersection of several sensible design decisions which combined to produce an unfortunate corner case.

Unfortunately, this behavior is unlikely to change anytime soon in Perl 5 (no matter how broken a feature, you can't argue a non-existence proof for code no one can see). The correct behavior is to redispatch based on the current class of the invocant. This is what the SUPER module from the CPAN does instead.

Note that Moose solves this problem in a similar way.

To start: the title is a false dichotomy. I develop some code and I distribute other code. I happily let Debian and Ubuntu package software such as Vim and X.org for me, not to mention Kmail and glibc, but I also happily use CPAN.pm to manage Perl 5 modules.

A rather silly debate stirs in the Perl 5 community now and then. Someone claims "Your project has too many dependencies!" One response is "Don't reinvent the wheel."

If I could strike words and phrases from polite debate, "bloat" and "reinvent the wheel" would disappear shortly after "utilize" and the verb "to task".

You May or May Not Like Forest Creatures

This time, the debate is about Moose startup time -- in particular, whether the benefits of using Moose for Padre outweigh the disadvantages.

The advantages are:

  • More declarative class declarations (especially through MooseX::Declare)
  • Better object and class flexibility than Perl 5 provides by default
  • Access to a wide range of Moose design patterns and plugins
  • Generally less code to maintain to achieve the same feature set

The disadvantages are:

  • Moose and Class::MOP add a few (12?) extra dependencies to a modern version of Perl 5
  • Creating Moose objects increases startup time and memory usage by a measurable amount. (Note that I used the word measureable, but not large. You can measure this amount. Whether the amount is trivial or significant depends on your problem domain.)
  • Rewriting existing, working code may not prove beneficial at this point in the project.

Keep that in mind for a moment.

Truth in Advertising Distributed Software

LWN covered a debate on p5p and Fedora mailing lists about the Red Hat practice of distributing Perl 5. If you install the perl package, you don't get what most people reading this would consider "Perl". In particular, you can't use CPAN.pm because it's not installed.

If you want to use CPAN.pm, you have to install the perl-core package. Some might say that the perl installed from the perl package is broken. Certainly it doesn't do what I might expect.

One difficulty that distributors such as Red Hat, Sun (with Solaris), FreeBSD, and Apple (with Mac OS X) discover is that users uses of Perl 5 vary. A couple of megabytes of perl and a few core libraries may suffice to run basic system administration programs necessary to the installation and ongoing maintenance of a system, but I want all of the Perl documentation, the Unicode tables, and even the shared library for Scalar::Util installed correctly before I consider that the preinstalled Perl 5 is usable and complete.

I can understand Red Hat's choice, and mostly I consider their choice of nomenclature buggy.

In some ways, that poor taste grates more than a mistaken technical decision. I have no love lost for the Perl 5 distribution YAML::Tiny, written deliberately not to parse YAML, yet attached to the name like some alien parasite determined to suck the precious bodily fluids from its host.

Yet I also understand why the ::Tiny distributions exist: to do a job quickly, using as few resources (runtime and dependency-wise) as possible, solving 80% of the problems without fuss. That's good for developers and good for distributors, at least to a point.

The Debate is Not Even Wrong

Sometimes I hate long dependency chains, usually when I have to chase them down during a long installation.

Sometimes I'm happy to reuse code, like a garbage collector or Unicode library or binary-coded decimal or date and time handling I don't have to code or debug or even understand myself.

Sometimes I'm even happy to remove a dependency if it means that more people can use software to which I've contributed, or if it makes the software easier to maintain or easier to install or faster or simpler.

The difficulty is that we value different criteria differently at different times for different tasks. I don't mind if Padre takes two seconds to start, if I use it for two hours a day and it doubles my productivity. Contrarily, I'd like Callgrind to run faster, but it's valuable enough as it is (and some of the software I profile has its own flaws) that I don't mind the speed hit it takes for the crazy job it does.

The problem is that the "Your software is bloated!" and "You reinvent the wheel, badly, and you lie about its name!" debate is also a false dilemma. We have a wealth of other options to attempt to make people happier.

Imagine if you could get a single bundle of all dependencies for any CPAN distribution. Obviously there are complications: can you compile XS code, do you have alien library dependencies, are the licenses compatible? Yet improving the distribution of code -- especially with regard to dependency graph version compatibilities and test reports -- could help.

Imagine if Perl 5 borrowed just enough of Moose's declarative class/attribute syntax to make the easy things easy, remove 75% of the boilerplate, and leave Moose and Class::MOP for the other difficult things where it's obvious that you need that full power.

I wouldn't call it Moose::Tiny, but I suspect that a handful of features (declarative class, attribute, and method declarations, auto constructors and accessors) could banish blessed references in new code at almost no startup time cost.

Then again, imagine if Perl 5 could see an order of magnitude improvement in performance. Could that render many of these discussions moot? Certainly these are design goals for Perl 6.

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide

Categories

Pages

About this Archive

This page is an archive of entries from September 2009 listed from newest to oldest.

August 2009 is the previous archive.

October 2009 is the next archive.

Find recent content on the main index or look in the archives to find all content.


Powered by the Perl programming language

what is programming?