September 2011 Archives

When Forking is Not an Act of Love

| 3 Comments

Distributed version control changed free software development in ways we're only now beginning to understand. I used SVK for years from the start, and it improved the way I work. (Yes, I committed code on airplanes in 2005. I've forked and patched projects I don't have commit access on. I've done that for six years now.)

These days Git and friends have taken over from SVK, and that's fine.

Two and a half years ago, while helping to brief an influential technocrat about what was actually going on in the world of free and open source software, I repeated a phrase I'd heard elsewhere. "Forking is an act of love."

Today, Github puts the truth to that. In writing a book (could be almost any book, but it's the Little Plack Book in this example), I read a lot of code. I read a lot of code and documentation for a lot of projects and modules. Sometimes I find bugs. Often I get confused. Sometimes I find gaps.

Distributed version control means I can fork those projects into my local repositories and work on them as if I were a committer. Git and Github help, but they're not necessary for other people to use. I could mail patches generated locally. The important thing is that the barrier for me to fork and branch locally and work work work and commit and then contribute upstream is sufficiently low that I just do it.

A nice feature of Git and Github (not solely a feature of either, but a nice feature) is that project maintainers can integrate my changes with relative ease.

I patched the Perl core before the Git migration, in the Perforce days. That was an act of pain before even navigating the internals of Perl. Now only half of that pain remains, and it's the expected pain.

A fork doesn't have to be a bad thing. If it's a small, focused fork and if it's always intended to merge, a fork can be an act of love. Certainly me adding documentation or revising documentation or fixing a bug or adding a feature because I'm there, I'm thinking about it, I have ten minutes, and I can do it and move on is a donation to the project and the greater Perl community and the world of free software because I care about all of those things. (I come across sometimes as a cranky old curmudgeon because I'm passionate about writing great free software. Also, I'm overeducated and tend to amuse myself when I write prose.)

And then.

IBM forked Perl several years ago, keeps that old version alive as some sort of sad zombie scarecrow, and is unwilling to help the Perl porters keep modern versions of Perl running on their platforms.

Sometimes forking is an act of enterprisey customer- and upstream-hating violence, where short-sighted expedience several years ago has succumbed to the staggering sinkhole of deferred maintenance, and ... well, at some point, it seems counter-productive for IBM to continue to call their lumbering Frankenstein's dinosaur monster "Perl", especially if they've hacked in special magic to get it to work.

I suppose the good news is that IBM's customers have paid for that monstrosity, and of course you know money, and commitment, and time, and knowledge are just flowing from IBM to p5p....

Maybe I ascribe too much malice and not enough incompetence. After all, what are the odds that a company like IBM could find spare computing time on a spare machine to give an interested volunteer or two the opportunity even to see what it would take to get a modern version of Perl running on their platform?

Update: With that said, IBM has contributed platform patches to Perl in the past, so why not continue that?

How to Learn Perl

| 3 Comments

You've decided to learn Perl. You're excited. You have problems to solve, and you know that a little bit of learning and some hard work will help you convince your computer to do things for you. You've started an important process, one that will benefit you greatly if you stick with it.

Your good intentions are the first step. The fun begins now.

The Basics

If you already know a little bit about programming—if you understand the difference between values and variables, how if and else work, how to call functions and deal with their return values, and most important how to use a text editor and save a program and run it on your operating system of choice, you're way ahead of most people.

If you're not sure if you know all of that, or if the meaning of those words weren't immediately obvious to you, you need to learn those first. Fortunately, learn.perl.org has some great and modern tutorials which cover those basics. Eric Wilhelm's Learn Perl tutorials are also useful. The Perl Begin site has copious resources, from advocacy to practical Perl programming advice.

That'll get you set up to write and to run Perl 5 programs. Now you need something to help you figure out the absolute basics of programming. Beginning Perl is a free online book which explains those basics. It's a bit old, but the basics haven't changed. When you're comfortable understanding what variables and values and functions are, you're ready to move on.

Learning Perl, sixth edition costs money, but previous editions are cheaper. It also covers the basics of Perl 5 programming. If you've never programmed before, you're the target audience. If you've programmed a little bit in another language, it may be too basic for you.

Writing Good Perl

If you're comfortable with the basics of programming, Perl 5 or otherwise, my book Modern Perl is available for free download (though you can always buy Modern Perl online if you're so inclined). While the book does offer a review of basic programming concepts such as variables, functions, and control flow, I took pains to explain these ideas in terms of Perl 5's philosophy. By the end of the book, I plan for you to understand how Perl works so that you can take advantage of its strengths and avoid its weaknesses.

At this point, many people recommend reading Programming Perl, affectionately known as the Camel. I don't recommend it myself; the third edition is a decade old and Perl 5 has changed in some important ways since then. A new edition should come out in late 2011 and may be a better resource. (I've read both the second and third editions in their entirety, start to finish, and that helped me as a novice Perl programmer in 1999 and again in 2000, but the books serve better as reference material than tutorial).

The best way to become a good programmer is to write code and get feedback on that code.

Testing

One way of getting immediate feedback on your code is to develop the discipline of rigorous testing. It's not nearly as difficult as it sounds. It's actually even fun, if you do it right.

Modern Perl uses Test::More throughout its examples and includes a section on writing tests. You can also read Test::Tutorial for a thorough introduction to the ideas of testing, from the simplest point of view to the bread-and-butter approach most Perl 5 developers use.

Testing is as much art as is designing software, and as such only experience will help you find what works best. Even so, I like to follow a simple process that lets me build up a lot of good tests in a repeatable way:

  • Before you add a feature to a piece of code, as yourself "What should this do?" and "How will I know when it does it?"
  • Based on that, give this behavior a name. This is probably going to be a function or a method.
  • Figure out the one thing that named code should do and think about how you'll know if it does it.
  • Write a test for that one thing.
  • Ask yourself if anything can go wrong, and write tests for those things too.
  • Make sure the tests pass.
  • Clean up any duplication or confusing things.
  • Repeat.

That list doesn't dictate when you write the code to make the tests pass. I have a lot of success writing a test or two before I write code to make them pass. Other people swear by writing a line of code and then a test. Try both approaches.

The nice part about this process is that if you don't really know what to expect from a line of code or a Perl 5 construct, you can write a test about your expectations and see if you can make it pass. In other words, use this testing framework to teach yourself how Perl works.

Learning from Others

You'll have questions and you'll need answers.

I've been a member of PerlMonks from its first day as a public site. (I was the second person to sign up.) I've learned as much about Perl programming from the site as from anywhere else. If you join and read questions and answers and take the time to do your research before asking questions, you'll get a lot from the site.

The Perl Beginners mailing list is a community dedicated to helping novices learn Perl in a friendly and non-judgmental way. Again it's worth lurking for a bit to figure out the community guidelines before you post, but asking a good question on here will get you good answers.

If you prefer real-time interaction, the #perl-help channel on irc.freenode.net has friendly people who are willing to help novices willing to ask good questions and take advice well. Lurk first to see how the community works.

Perl Mongers user groups meet regularly to discuss Perl, computing, and technology, and they offer opportunities to meet and socialize with other people learning how to write great code. If there's a PM group in your area, go! If there's no PM group in your area, perhaps you can start one.

Writing Great Perl

Writing great Perl 5 means taking advantage of the CPAN. Even if you don't contribute to the world's largest library of free software components, structuring your code in the CPAN style offers tremendous benefits. Again, Eric Wilhelm's How to Use Perl Modules and CPAN and Perl Configuration Howto are great resources.

Yet writing great Perl means solving real problems. Here you need a project and some way of measuring the progress toward your goal. This is for you to discover.

My first Perl program slurped lottery data from a public site, analyzed it, and plotted the frequency of appearance of each possible number of the results over time. (I don't play the lottery, but I found the statistics interesting.) That program still runs today (automated, twice a week), even though I may go years without looking at it.

Find something that interests you. Find a way to automate it. Keep a list of changes or improvements or new techniques you might apply. Write down what you think about when you're commuting or walking or falling asleep or bathing. When you can't get it out of your head, break it into small pieces, test and experiment, and see what happens.

Programming well requires knowledge, certainly, but like anything else it requires passion to keep you practicing in a disciplined way. The resources I've mentioned here can give you knowledge and will help you develop your discipline. (They're not the only resources, but I believe they're great resources.) What's left is up to you.

You've taken an important step in starting to program Perl. It's rewarding and enjoyable, and you'll be able to do amazing things. Keep your focus, but keep your eye on your goals. You have a lot to be proud of, and you're just getting started.

Always TDD, Except When You Shouldn't

One of the nice things about "best practices" in software development is that there are so many to choose from. One of the less nice repercussions is that choosing between them means exposing yourself to what could charitably be called a fashion-driven buzzword-slinging mudpit of logical fallacies and hastily generalized arguing. Welcome to the Internet.

Pete Sergeant posted a rant about TDD that's true in places, but goes too far in others. He asked me if I believe that "all code needs its tests written first, always".

That's a nope.

I'm working with a couple of friends on a personal finance site, and I wrote a tiny little program to repopulate the pool of invitation codes. It's a loop around a SQL query managed by DBIx::Class and some business objects already written and tested. It's glue code. I didn't write tests for it, because those tests would have no value. (Honestly, if DBIC's create method doesn't work, I have bigger problems than my 9 line program.)

Peter's question is bigger than "Hey, do you ever write trivial programs?" Perhaps a better phrasing of the question is "What value do you get out of TDD?", as that offers the chance for nuance and technique in which you can decide when and where it makes sense as a design technique.

Design? That's the primary value I get out of TDD. When I have a feature I want to add or a bug I want to fix, I can express that in a sentence or two, such as "When users mistype their passwords, link to the password reset page with their email address as a query param. Populate the password reset input field with that address." While this is primarily a UI/UX change, it implies a couple of changes in the controller as well.

The most obvious place to test this behavior is in the web interface tests. I can write a couple of lines of code which express the right behavior on the login failure side and a couple of lines of code which express the right behavior on the password reset side.

As I wrote the previous paragraph, I realized that that use case is actually two features, and they're somewhat decoupled. The shared interface between the two is the generated URL with the user's email address as a query parameter. If I can assume that both sides agree on using the URL in the same way, I can test and implement this feature in two steps.

That's good; that means I can work in smaller work units. It also means my tests can be smaller and more encapsulated.

When writing tests for the login failure portion, I might stop to think about edge cases. What if the user has never registered? What if the user has mistyped her email address? What if the user hasn't verified her account?

I might decide to ignore those possibilities for security purposes (why give an attacker any extra information?), but I find that the act of writing tests for very small pieces of behavior helps me to consider my invariants in a way that writing my code first and testing later doesn't.

With a simple test written—request login with an invalid email address—, I expect a failure. It happens. It's easy enough to update the template with the appropriate information and run the test again. It should pass. If not, I have a tiny little change to debug. If things work, I can move on. (What if the password is wrong? That's a new test, but the same behavior should work. Repeat.)

When all of the relevant tests I can image pass, I spend a couple of moments looking over the new tests and code I've written. Did I get it right? Does it make sense? Is there duplication to remove? (I can't tell you how many times I've found two pieces of code that are sufficiently similar that I've extracted out duplication. This includes my tests.) I may decide to wait to refactor, but I have that opportunity.

Then I repeat the process for the other half of the feature. The whole process takes less time than explaining how the process works.

That's the ideal, anyhow.

Yet when I work in bigger steps or when I write tests after the fact, my perception is that I spend more time debugging and I write code that's more difficult to test. Certainly I deploy more bugs than when I write tests in conjunction with the code under test.

Notice something about what I wrote though; or notice something that I didn't mention in my writing. I said nothing about setting up an invasive dependency injection framework with everything abstracted out with lots of lovely mock objects and extreme isolation of components. That's because I don't believe in using invasive dependency injection frameworks to abstract everything out with ubiquitous mock objects and extreme isolation of components. That's a one way ticket to Sillytown, where your code passes its tests but doesn't actually work.

I'm testing real code. Where it doesn't work (as was the case with a bug I fixed an hour ago), it's because of a difference between our testing environment and our deployment environment in terms of external configuration. (The test for that passed in our system because it's the only place we use a mock object, and it's only a single test.)

Should you build lots of test scaffolding and use lots of cleverly named frameworks so that you can reach inside methods and make sure they call the right methods with the right parameters in the right order on the right mock objects something else has injected from elsewhere? Again, no.

I'm not sure what that has to do with TDD though.

In my experience, TDD works best for me when I:

  • Use my code as realistically as possible
  • Work in small steps
  • Commit often
  • Break tasks into smaller tasks when it's obvious I can test and implement them separately
  • Take advantage of refactoring opportunities whenever possible, including in test code
  • Think through the edge cases and error possibilities as I test
  • Have an idea of the design from a high level ("I need to call or implement these methods, and I can reuse that code")
  • Have an existing, albeit minimal, scaffolding of representative test data

If I've never used an API before, TDD might not be appropriate. If I'm never going to use the code again, TDD might not be appropriate. If it takes longer to write the tests (Nine lines of code in the final program? Probably not worth it now.) than to write and run the code, TDD might not be appropriate.

If, however, I think I'm going to regret not having designed the code and tests in parallel, I use TDD. It might not work for you. It works very well for me.

(Should you use TDD's test-fail-code-pass-refactor cycle? That depends. Do you have the discipline to write good tests? If not, perhaps you're better off not using TDD. You might get yourself into the kind of mess that's apparently burned Peter one too many times.)

Introducing Plack::Test::Agent

After a discussion with Zbigniew and Miyagawa on the subject of Plack::Test and Test:: Interfaces, I wrote Plack::Test::Agent as a proof of concept.

I really like the flexibility of Plack::Test, as testing an app in process or against a backend is trivial. I understand the callback interface—it definitely feels PSGIish, and it fits the interface that Test::TCP prefers. Yet there's always room to experiment.

Plack::Test::Agent offers the ability to run tests in process or over HTTP against a server. It relies on Test::TCP and Plack::Loader for the hard work, and borrows liberally from Plack::Test. Yet in doing so, it offers an OO interface:

my $agent = Plack::Test::Agent->new( app => $app );
my $res   = $agent->get( '/?have=foo;want=foo' );
ok $res->is_success, 'Request should succeed when values match';
is $res->decoded_content, 'ok', '... with descriptive success message';

Pass a server key/value pair, and it'll do its best to find and launch the appropriate server. All of your tests should continue to pass, modulo bugs in your assumptions or the handler/server interface.

Just for kicks, I added an experimental feature:

my $mech = Plack::Test::Agent->new( app    => $app,
                                    server => 'HTTP::Server::PSGI' )->get_mech;

$mech->get_ok( '/?have=foo;want=foo',
    'Request should succeed when values match' );
$mech->content_is( 'ok', '... with descriptive success message' );

... which returns a Test::WWW::Mechanize object bound to the started server such that relative URI requests go directly to the server. (Absolute URI requests remain unchanged.) The next obvious step is to return a Test::WWW::Mechanize::PSGI object for in process testing. That's the work of a few moments.

Perhaps that goes too far, but I do like how it's simplified my test code so far.

Plack::Test and Great Test:: Interfaces

| 4 Comments

Plack is great. Most of it is wonderful. I'm less enamored with the interface of Plack::Test, however.

In writing the Little Plack Book, I spent a couple of days writing tests for Plack and applications at the PSGI level. Plack::Test occupies a strange level in the ecosystem. It's incredibly useful for what it does in selecting between a no-HTTP backend or any other PSGI-compatible handler, and it offers some degree of abstraction for making requests and getting results, but it's far too low level to write tests for complex applications.

When writing tests for a small Dancer application, I spent more time getting the Dancer environment set up well for testing than I did writing the tests. If I'd used Dancer::Test, I suspect I'd have made more progress more quickly.

I've had similar experiences with Catalyst and Catalyst::Test.

None of this surprises me—Catalyst and Dancer are one layer up the stack from Plack. Most of the interesting things you can test about a Catalyst or Dancer or Mojolicious or whatever application are properties expressed at the framework and application layers, not the plumbing layer of Plack.

Plack::Test seems best for testing middleware and other components which occupy the layer between PSGI and applications. Even so, something about its interface kept bothering me as I wrote tests and prose about the tests:

test_psgi $app, sub
{
    my $cb  = shift;

    my $res = $cb->( GET '/?have=tea;want=tea' );
    ok $res->is_success, 'Request should succeed when values match';
    is $res->decoded_content, 'ok',
        '... with descriptive success message';

    ...
};

The pattern of a special test function which takes a block of code is semi-common in the Test::* world; you can see it in Test::Exception and a lesser extent in Test::Fatal. The best example I've seen of this is Test::Routine, which uses this block syntax to help organize tests into named groups. A disciplined tester can use this to great effect to clarify what might otherwise become a big ball of muddy tests.

I like that Plack::Test does the hard work of redirecting requests to the app I want to test on whichever backend I want, so that I don't have to worry about setting up a testing deployment target. That part's great. The confusing part is:

my $cb  = shift;
my $res = $cb->( GET '/some_url' );

test_psgi takes a PSGI application and an anonymous function as its parameters, then invokes the anonymous function, passing a callback bound tothe context of the application. Inside the anonymous function (the block, not the callback), you invoke the callback and pass an HTTP::Request object (constructed manually or with a helper such as HTTP::Request::Common) and receive an HTTP::Response object.

Put that way, it's a lot more confusing than it is, if you're comfortable with the idea of first-class functions and closures and semi-fluent interfaces in Perl 5.

Even so, my $res = $cb->( $req ) just looks weird to me. It sticks out. It's visually different from all of the rest of the code. Everything else outside the test is the semi-fluent interface of anonymous subs or boring old method calls on objects.

In a discussion with Zbigniew Łukasiak, I suggested that I'd want an interface more like:

use Plack::Test;

plack_test 'Test description here' => sub
{
    my $app = shift;
    my $res = $app->get( '/', { have => 'tea', want => 'tea' } );

    ...
};

You can see the influence of Test::Routine. I don't know exactly how $app gets into the block, but this gives labeled subtests and the concomitant organization, it obviates the need to create HTTP::Request and HTTP::Response objects (or their Plack:: equivalents) manually, and everything in the block uses visually similar mechanisms.

The only hairy part is figuring out how to connect plack_test to the app while not multiplying hidden global variables or disallowing customization and decoration with other middleware. Compatibility with Plack::Builder is important, but I don't especially want to pass in $app myself manually.

Besides, to me the only obvious benefits over Test::WWW::Mechanize::PSGI are the subtest groupings. Mech has a huge advantage of providing many useful test assertion methods which grovel inside responses so I don't have to.

Maybe this is less my discomfort with one part of the useful-at-the-appropriate-level Plack::Test and more a plea to distinguish more clearly between various elements of Test::* modules, as several distinct categories of behavior exist:

  • Setting up a testing environment
  • Organizing test assertions
  • Providing test assertions

We did a huge amount of work a long time ago with Test::Builder making sure that everything which wanted to provide test assertions could interact nicely, and we succeeded. Maybe it's time to consider ways to enable that composition at other layers of the Test::* stack.

Plack::Test Backend Selection

I'm working on the testing section of the Little Plack Book. Simplified testing is a huge benefit of Plack, but Plack's flexibility is evident even in Plack::Test.

You can run your tests against a mock HTTP server with Plack::Test, but you can also run them against any server backend supported by a Plack::Handler.

If you use Plack::Test (whether directly or through something which builds on Plack::Test), when do you use a real server backend? What are your motivations for doing so?

I've come up with "standard devops paranoia" and "testing server compliance" as two good reasons. What others exist?

My esteemed colleague Ovid has led a small private discussion about teaching Perl to people who already know how to program, and the subject inevitably turned to the topics in my Modern Perl book. Blame me.

Like half of the people reading this, I have years of experience unsnarling and helping novices unsnarl code written without a firm grasp of programming. Again, there's nothing wrong with writing Baby Perl, but the difference in maintainability, quality, concision, and reliability between Baby Perl and grown-up Perl is striking.

These experiences tempt me to generalize that novice programmers have trouble learning three important principles: abstraction and composition, user input, and robustness.

Abstraction and Composition

Inefficient programmers tend to experiment randomly until they find a combination that seems to work.

— Steve McConnell, Code Complete

Programming—symbolic computation—is primarily the act of representing a problem in terms of the proper abstractions so as to manipulate the individual entities of the problem efficiently and correctly. I suspect that you'll struggle with even simple programming unless you have the basics of algebra correct, and something more than a + b = 10.

The first peak novices must scale is understanding the difference between a value and a name which stands for that value, and that the computer doesn't care about the name.

The second order effect of this is when a novice realizes why it's stupid to use a variable as a variable name. This is where composition comes in. You know how to manipulate scalar values. You know how to manipulate aggregates, such as arrays and hashes. What happens if you combine those principles?

Some languages do better at this than others. PHP is terrible; witness the comment section on PHP.net sometime. If genericity, abstraction, and composition are the disease, PHP.net is the rusty horse needle containing 100L of vaccine. (I know the difference between cc and L,thank you.) PHP encourages people to reach the "I can search the Internet for code to copy and paste and experiment randomly with gluesticks and glitter" stage of development, then chains them to tables making rhinestone-encrusted wallets to sell to tourists during the next dry season.

Compare that with Scheme, at least as taught in The Little Schemer where certainly it's impractical to write addition and subtraction primarily in terms of recursion, but by the end you're going to know how recursion works and how symbolic computation works and that you can define what you thought of as primitives in terms of other primitives and, by gum, you'll be a better programmer for it.

I think this is what some people mean when they say "Never trust any developer who doesn't understand C pointers", because it'd be crazy to claim that knowing how to work around the limitations of the PDP-8 memory model in 2011 is ceteris paribus all that useful for most programmers. Understanding enough of the von Neumann model in practice such that all of the convenient parts of modern programming languages are just names hung on buckets somewhere in silicon should suffice.

From there, the next hurdle to overcome is understanding genericity, whether in terms of polymorphism or type substitution. If properly explained, Perl can make sense here. When you write my $birthday = $YYYY . $MM . $DD; it doesn't matter so much that $YYYY and $MM and $DD are all numbers as it matters that they know how to stringify.

Yes, that's polymorphism.

You're welcome to explain that in terms of duck typing, if your language isn't cool enough to support better abstractions, or you could pull out Real World Haskell and begin to think about type classes.

("What about Java?" you ask? You could probably learn most of these concepts if you were properly motivated, and if you kept away from your IDE's autocomplete and copy and pasting example code you found on the Internet. If you find yourself managing large lists of JARs and trying to resolve conflicts between them, or if your best impression of genericity and polymorphism is writing big XML files to take advantage of dependency injection or at least slapping annotations on everything and hoping. That's not to say that it's impossible to write great code in Java, but you're going to need a lot of self-discipline to do it.)

Haskell and other languages with real higher-order functions (Java and Python don't count, JavaScript and Perl and Ruby do, I haven't used C#, and I don't have the heart to test PHP) can take you to the next level, where you manipulate functions as if they were data, because they are. You don't necessarily have to be able to build your own object system out of closures, but you should be able to understand the equivalence of encapsulation and abstraction and polymorphism available when you limit yourself to the primitive abstraction of apply. (Hello, CLOS! How've you been all of these years?) Certainly patterns such as Haskell monads count, because they're language-supported abstractions over the structure and execution of program components.

You don't have to master all of these techniques to move past novice status, but you should be comfortable knowing that these techniques exist and beginning to recognize and understand them when you see them.

User Input

In my conversation with Ovid, I mentioned "Handling Unicode", but I can expand this point into something more specific.

User input lies.

Sometimes it lies deliberately, as when a malicious or curious user gets the curious idea to provide bad data to see what happens. Sometimes it lies to your expectations, when a user like me has a single-word pseudonym and if you force capitalize it, you're doing it wrong. (You'd think the soi disant "world's largest encyclopedia" would bother to get spelling correct, but Wikipedia editors are apparently too busy deleting all things non-Pokémon and Star Wars Extended Universe to fix their software.)

Sometimes it lies because the world isn't solely the realm of ASCII or even Latin-1, and if you don't document your expectations and bother to find out the truth about the data you're getting or what other people expect of the data you're producing, you'll see what real line noise looks like.

I had an interesting discussion with a motivated novice a while back, when I reviewed some code and found a SQL injection attack vector in cookie-handling code. "Never trust the user," I said.

"Why would anyone do that?" he asked.

"Never trust the user," I repeated. You're not paranoid enough. Even if you think you're paranoid enough, you're probably not. In a subsequent discussion, I said "Never parse XML with simplistic regular expressions—you're expecting that that start element will always occur on a newline with a four space indent."

Unfortunately, only bitter experience can burn this lesson into the minds of some developers. Certainly several years of doing things quickly the wrong way and having to fix them later taught me better. Mostly. (I wrote my own web server without reading the HTTP RFCs, and it mostly worked. Mostly. See also "Why your CGI parameter processing code has at least five bugs and one latent DoS attack vector, and then learn how to use a library and get on to more interesting things.")

Robustness

If the latter sounds like robustness, it's true. Perhaps handling input and output properly is a special case of robustness, but I separate them because you really can't trust user input.

A robust program knows what's likely to go wrong, how it can recover, when it can't, and does the right thing in those circumstances.

McConnell's flailing novices tend to be so happy they managed to find the right magic incantations to get something working that they're reluctant to rejigger their Jenga towers (American cultural imperialism warning here) for fear that everything will collapse.

Robustness doesn't mean that every program needs configuration-driven logging and a failover heartbeat monitoring system. Sometimes it's enough to throw an exception, then let the next cron invocation restart the process and pick up where it's left off. (I really like this pattern.)

Yet if you haven't thought about what's likely to happen, what's unlikely to happen, and the consequences of each, your program is probably much more fragile than it needs to be. Yes, understanding risk and consequences and error patterns takes experience, but error handling should rarely be optional. (See The Tower of Defaults.)

Software isn't magic. It's part science, in that we can make empirical statements about what we understand to be reality, test them, and reassemble them into larger frameworks of axioms. It's also part art, as we have a fantastic creative palette with which to do so. Both require knowledge and a thirst for knowledge and a desire for a depth of understanding.

When novices expand their minds and experiences to encompass all of these tools, they become all the better for it. That is what I want to encourage in any book aimed at novice developers or novices in any particular language.

RESTful Perl Resources

| 6 Comments

Suppose you have a new project, and suppose part of that project involves server-side web programming. Suppose the client would also like a RESTful interface as well as the web interface.

Assume that you don't have complete control over who else to hire for the project, but that you can specify sufficient familiarity with Perl and web programming.

What resources would you give to the other programmers to help them design, implement, and maintain a RESTful web application written primarily in Perl? Assume that you don't have time for a two day immersion class (because you'd have to teach it, after all), and that you can hand out the RESTful Web Services book and tell people "Overlook that silly Ruby code."

Assume that the Perl framework used for the website is Catalyst, and that you have access to the full Plack ecosystem.

Modularizing Core Features

Perl 5 project leader Jesse Vincent has made a textual version of his Perl 5.16 and Beyond speech available in prose form: Perl 5.16 and Beyond thread on p5p.

I appreciate Jesse's blend of pragmatism in both keeping working things working as well as the willingness to change things that aren't working. His plan is more conservative than my idealism likes, which means it's probably a wise approach.

The last lines of his plan are telling:

We're awesome at modules. Where possible, we should be modularizing core features.

I've argued before that the most successful feature of Perl 5 enabled and encouraged the growth of the CPAN. (Michael Peppler's 20 Years of Sybperl is a good history lesson of the dark ages before Perl 5 and the CPAN existed.) Even though the original philosophical goals of Perl 5 reached a local maxima in Perl 5, never to overcome gravity until Perl 6, the ability to subvert and extend the language through CPAN has kept Perl 5 alive and thriving far beyond the point most people would have thought in 2000.

Yet Jesse's also right that features such as System V IPC, formats, tie, et cetera are superfluous to a lot of uses. So are the Perl 4 compatibility libraries that have gone untouched in the core since Perl 4.

A slimmer, trimmer, smarter Perl 5 core could be more agile. It's likely to be easier to maintain. It's almost certainly easier to port to other virtual machines (and yes, this is an excellent long term survival strategy, though you must balance the short term needs against it). Even the discipline of figuring out how to improve the current extension mechanisms can lead to better code.

It's merely a terrible amount of work, and the history of the CPAN demonstrates that throwing willing volunteers at necessary projects eventually gets the work done in a sufficient way.

One of the best places to start is to help figuring out how to smoke test significant portions of the CPAN against bleadperl topic branches. If you have access to bandwidth and significant computing resources (or at least the knowledge of how to parallelize an embarrasingly parallel task like this), p5p could use you.

Taming the The Great Stampede

Scott James Remnant described problems with a mixed milestone-and-date release process as it applies to Ubuntu GNU/Linux.

The specific problem is that each milestone represents a date by which you absolutely must finish your work lest you have to wait for the next milestone. In one sense, that's a feature of regular release cycles: if a feature isn't ready for the current release, it can wait until the next release. In practice, uncautious project management can turn this virtue into a vice in two ways:

  • If you get rewarded or punished based on meeting or missing calendar dates—not the appropriate level of release quality—you will prioritize calendar time over quality.
  • If the periodicity between releases is too great, you'll prioritize getting a project in the current release over waiting until the next release.

Choosing the right periodicity is more art than science. (Project management is art, not science.) Yet you can know the period length is inappropriate when the end of a period approaches and you suffer the great stampede of features suddenly becoming good enough to land the week or weekend or day before your cutoff date for features landing for the next Great Big Release.

You can see the evolution of this understanding in Parrot over the past couple of years. (It wasn't enough to save Parrot, but no technical policy can prevent active malice from untrustworthy committers.) Whereas Parrot promised to adhere to a six month cycle of publishing a deprecation or incompatible change notice before making the change or removal, that promise was simply untenable as waiting six months to make necessary changes left no one happy, least of all the end user implementors for whom the policy should have been most useful. That big stampede happened in Parrot as every six month cycle closed.

When Parrot switched to a three month period, the big stampede still happened.

When I'm fortunate enough to work within a dedicated team of full-time developers using an agile approach, an iteration period of a week or two often seems to work the best. Anything longer than that still has stampedes, but delaying a branch merge by a week or so to get another day or two of polish isn't a big deal. Delaying a branch merge by a month or so pushes the edge of what's acceptable.

Granted, releasing stable and useful software still requires a lot of discipline from project management and developers. Slashdot comment screeching to the contrary, shorter release cycles don't have to mean that Unity-style big bang UI changes will land every month. (I suspect that Unity could have used another few months of development and testing and integration, if not another six month cycle). It's also often a simple matter of programming to release a feature present but disabled by default so as to get more testing on deployment and integration without forcing users to siwtch yet.

Yet the most important matter of discipline is unambiguous and ubiquitous project status information such as the reports from automated test suites, memory leak checkers, branch status reports, bug reports, and anything else that can measure the quality of the iteration's eventual release.

Certainly this requires a substantial investment on the part of projects to build and maintain this infrastructure, but it's the only reliable and repeatable way I know to produce great software. (If you've noticed that Perl's monthly bleadperl releases are rather boring, that's the point! Most of the excitement takes place at project boundaries such as the interactions with CPAN modules.)

The Right to be Wrong

| 1 Comment

Language design is hard work—not only because consistency of vision is a laborious process, but because the only way to know what users will do with your language is to wait until users use your language to do things. I wrote several questions for my friends to use in their book Masterminds of Programming, and one question I recommended asking of every language designer was "How do you plan for an uncertain future when the only thing you know is that you'll want to change something you had wrong?"

I'm rushing one of my side projects to the point where a hand-picked group of friends and family can use it for practical purposes because they're my target customers, and I'm neither so charming nor dashing that they won't tell me with brutal honesty what works, what doesn't work, and what I never should have thought they'd do.

No plan survives first contact with the enemy.

I have tremendous respect for the brave release managers of Perl, as a seemingly innocuous change could cause countless hours of work for thousands, tens of thousands, or even hundreds of thousands of developers. (One time I almost broke half of the CPAN myself.) The need to get everything right exerts immeasurable pressure to not do anything wrong.

The difference between "doing it right" and "not doing it wrong" is a vast abyss.

Consider: I extract a CPAN module from code I'm using. I make it more generic and reusable. I polish it. I publish it. Then the first report I get about it is "This would be great, if..." and that's a great thing. In truth, that's an expected part of Perl culture. That's one reason CPAN version numbers stay below the magical candy-flavored rainbow sparkle 1.0 threshold for so long (POE took ten years to reach 1.0, and it's by no means the only example): we want the right to be wrong and to change what's wrong.

That's one reason the common answer to "How do I get a new feature into the Perl core?" is "Write it as a CPAN extension first." (That answer is, as often as not wrong, but it's right for philosophical reasons and wrong for merely pragmatic reasons, and the latter are at least solvable.)

That's one reason the Moose backwards compatibility policy exists. When inventing new things that no one has ever invented before, it's easy to get things only mostly right, and it's impossible to know what's wrong without people using it and reporting on what's difficult or impossible or ugly.

The lack of that feedback is one reason that Perl 6's language design and implementation has gone through cycles of reinvention and foundering. With over a decade of no real users and no usable implementations, it's not easy to see where the language works and where it doesn't. Even though you can argue that Perl 6 gets a lot of things right practically, the consistency of vision and design required rearrangement of the middle layers of philosophy. Above all, there's no substitute for running code to discover how a system actually works, if it works at all. As a consequence, I've seen systems freeze themselves into poor designs because there's no solid evidence as to what users actually need. (Ask yourself why P6 has stagnated.)

That's one reason test-driven design and frequent, small, timeboxed iterations are part of agile or lean or whatever buzzword name you want to slap on the very practical, flexible, effective development process most of the great developers I know use—not because that's the only way to write great software, but because you can learn so much so quickly about what you really need when you give yourself the right to be wrong with little consequence.

I'm not praising the desire to rewrite willy-nilly, nor am I suggesting that the right approach to supporting software is to leave users to read changelogs and run their comprehensive test suites to decide whether and when to upgrade to new versions where everything is a candidate for major changes. I'd also never suggest that the right to be wrong is a substitute for careful thought and design based on the best information you have at the moment. We're professionals, after all, and we need to bring our best professionalism and knowledge and talent and care to our work.

Yet I am suggesting that we work with incomplete information—information that's expensive to gather in toto, if that's even possible—and that we have to do our best in the face of those limitations. Without the freedom to make little mistakes (however we apply that to our projects), we limit our possibility to make big wonderfuls.

What's with That Trailing Punctuation Anyway?

Tom Christiansen's What Wrong with sort and How to Fix It (blame me for the title) gathered a lot of necessary attention about the necessity of collation to sort data in various languages.

It also sparked a small discussion about "What in the world does that mean and why would you do that?" regarding a single line of Tom's code:

@sorted_lines = Unicode::Collate::->new->sort(@lines);

In particular, a few people asked "Why would you write Unicode::Collate::?" As with far too many grotty parts of Perl 5, the answer is "To avoid bareword parsing ambiguity."

Ambiguity? Sure. Unicode::Collate is a bareword. Oh, it's clearly a class name, unless it's a function call.

A function call?

Sure. It could be a call to Unicode::Collate(). This is a form of the same problem you get when making a dative (colloquially "indirect object") method call:

# buggy code; do not use
my $object = create Some::Class; # buggy code; do not use

That is to say, the meaning of this code can change depending on what else the Perl 5 parser has seen when it compiles this code.

If you're interested in gory details and you don't mind reading heavily macroized and partially documented accreted C code, look at the S_intuit_method function in Perl 5's toke.c. The comments in that code explain the heuristics for resolving barewords.

Appending the package separator (amusingly '; did you think I wouldn't try it?) makes the class name obviously a class name and not a function call. Ambiguity removed, at the cost of slightly more ugly code.

With that said, the ugliness bothers me such that I never use this syntax even as I admit its advantages. Instead I rely on coding standards to avoid potential ambiguity by using lowercase for method names. So far, I've been fortunate—but I cannot blame someone once burned for avoiding the problem at the parser level.

(A sigil to identify classes could fix this, as would a unique operator to instantiate or look up classes. None of these solutions completely satisfy me.)

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide

Categories

Pages

About this Archive

This page is an archive of entries from September 2011 listed from newest to oldest.

August 2011 is the previous archive.

October 2011 is the next archive.

Find recent content on the main index or look in the archives to find all content.


Powered by the Perl programming language

what is programming?