March 2009 Archives

Shooting Yourself in the Foot with Customer Branches

By chromatic on March 30, 2009 12:35 PM

This weekend a colleague described his frustrations with a software project at his job. I suggested that he leave a few copies of

The Art of Agile Development (Powell's affiliate link) lying around his office. In particular, the largest problem I heard in his discussion is that his team has no idea how to release software.

"It's weird," he told me. "You'd think we'd stop making the same mistakes over and over again."

One such problem is rampant and divergent customization. The project makes an embedded device for sale to OEMs. They've based their product on a standard hardware set and use the Linux kernel to drive the hardware and provide a common API for manufacturers to develop their own products. The hardware and software are both flexible. A manufacturer can specify a custom set of behaviors, and the hardware and software configuration will enable or disable features from the baseline set of components.

One of the danger signs James Shore wrote about in the Version Control chapter of the Art of Agile Development is rampant branching for customization. One common and dangerous temptation is to create a branch for each customer, making changes for that particular customer in code. If you must develop a new feature for that customer, branch the current stable codebase at that point and develop that feature on the branch. If another customer needs that feature as well, branch from the branch and maintain that branch for the new customer. Repeat until you have little hope of unifying all of those features and additions again. (In my experience, that happens the first time you branch from a branch.)

When I first read the draft of Jim's chapter, I said "I don't believe in branching." Fortunately, he didn't take my advice (and I've since repented of that). I don't believe in long-lived branches. A branch should be a simple, single-minded, and temporary divergence from trunk. The goal should always be to merge the branch back into trunk as soon as possible.

Besides the complexity of managing (and even remembering) the state of all of these customer branches, my colleague casually mentioned that his team is trying to upgrade from Linux kernel 2.6.23 to the newest version. Imagine propagating that through the gnarly tree of branches!

If instead, as Jim suggests, customer-specific configuration used a data-driven approach, and if current development always took place on an unambiguous and unified point, the work of upgrading dependencies would be far easier. It's the Don't Repeat Yourself principle expressed in terms of your version control system.

(Please note that I have no problem with distributed version control systems, as long as there is a single unification point and a strong community push toward unification. Tracking and merging changes, especially at the individual level, is very useful. Rampant and divergent branching is not.)

In Praise of High-Level Languages

By chromatic on March 27, 2009 3:01 PM

In The Fallacy of High-Level Languages, Scott James Remnant argues that the advantages of languages such as Python, Perl, and C# over C are negligible -- and that they make programmers of those languages somewhat untrustworthy:

I trust code written in C far more than I do any higher level language. No, that's probably not fair. I trust C programmers far more than I do programmers of other languages. If you tell me I have the option of choosing a program or library written in C over one written in Python or C#, I'll take the C one every time.

Having contributed to a couple of high-level languages by writing C code to implement said high-level languages, I can't agree. I'm not here to praise C. I'm here to bury it. C has some tremendous disadvantages over high-level languages:

Manual memory management. Yes, you should be able to match all of your malloc() and free() calls by reasoning about your program statically, but go write your own virtual machine which runs arbitrary programs with arbitrary memory requirements and see how well that works. Not every program is that complex, but not every program is that simple.
Cross-platform fun. Want your code to run on multiple platforms? Want to work around compiler and standard library and linker bugs on multiple platforms? Ever wonder why sometimes memcpy() gives you odd behavior when building with GCC with optimizations (and no, the source and destination don't overlap unless something has gone very, very wrong with what malloc()
Poor abstraction possibilities. Yes, C has first-class functions (if you can remember the syntax for declaring the type of a function pointer -- I never can), but if you want new programming ideas such as closures, partial application, or continuations, you have to fake them yourself. Don't even mention allomorphism or a type system which encodes richer semantics than "How many bits is this on a PDP-11?".
Verbosity.

(Okay, I should say more.) Implement a bare-bones grep-workalike in Perl and in C sometime. Compare the two. (It's do { print if /\Q$pat\E/ } while <> in Perl.) Now compare the rest of the boilerplate code -- and don't forget all of the code to handle errors in C.

If, as much of the programming world seems to believe, the error rate per line of code is stable when comparing languages, perhaps there's a sweet spot at which a more expressive language will have far fewer bugs than a less expressive language. Put another way, I won't even feign surprise if you tell me that a Perl or Python program has a fifth of the bugs of an an equivalent Java program or a tenth of the bugs of an equivalent C program.
Library support. As of this writing, search.cpan.org reports 66,704 available modules in 17,342 distributions. If even half of those are remotely useful, there are tens of thousands of libraries one install Foo away from any full Perl installation. There may be that many available C libraries somewhere, but show me a cross-platform installer which works with a globally mirrored hosting service with ratings, documentation, reviews, and a comprehensive and growing testing service.

I could go on -- projects such as HP's Dynamo and the Self programming language have demonstrated that runtime profiling and optimizations can improve performance over static compilation in many cases. As well, changes to the memory management of Mozilla Firefox 3.0 show that low-level memory manipulation can actively harm performance (arena-based allocation such as you might find in a system with an intelligent GC -- or even Perl 5 -- is much less susceptible to fragmentation).

I suspect that many people reading this have already made up their own minds, however. All I can tell you is my experience. I've never spent a couple of days optimizing a regular expression in Perl, and I've spent many days measuring and optimizing performance in C -- not to mention countless hours using gdb, Valgrind, and KCachegrind. Don't misunderstand me; I'm grateful for those tools.

Yet I'll be more grateful for the day when I never have to use those tools again.

Modern Perl Fundamentals

By chromatic on March 25, 2009 11:28 AM

As I write the book Modern Perl, I keep looking for organizational principles and guidelines to govern what to discuss, what to leave out, and the metaphors and explanations I give. (I'm getting close to the point where I can post draft chapters for comments; if you're interested in reading and giving feedback, please contact me privately.)

The most important question I've asked so far is "What's the minimal amount of Perl knowledge necessary to prepare someone to write and understand Modern Perl effectively?" I've decided there are three facets.

How to identify individual chunks of information in a single Perl statement. This is effectively the same type of behavior as diagramming sentences in a written language. You may not know exactly what $$ means in the statement $pids->{$$}--, but you should be able to recognize that $pids refers to a scalar variable, that ->{...} performs a hash dereference of a single key, that -- is the postfix decrement operator, and that $$ is a scalar variable.
How to refer to the appropriate places of the Perl documentation to understand individual chunks. This is the Perl equivalent of using a dictionary to look up words you don't understand. You may know that squamous is an adjective when H.P. Lovecraft uses it, but to understand the sentence fully, you need to look in the dictionary. While some people claim that you have to understand all of Perl to read anyone else's code, I disagree. You have to be able to look up the parts you don't know.
How to recognize and use common Perl idioms. The while/readline/chomp loop is one such idiom. So is the Schwartzian transform. So is the use of hashes for identifying set membership. (This suggests the existence of a perlidioms document.)

While creating the outline for and writing the book, I've realized that I have to leave out more information than I can include. (This is most apparent when discussing regular expressions.) If, however, I can teach people these three facets, I believe I can prepare them to write and maintain Perl well.

Feedback-Directed Development

By chromatic on March 23, 2009 1:03 PM | 1 Comment

I believe that the myth of the Benevolent Dictator For Life as project-omniscient prognosticator is pervasive but incorrect. While the need for a leader with the respect and authority to make a final decision is clear, this leader needs something other than complete and authoritative knowledge about how the project will eventually work. This leader needs good taste -- in design as well as community cultivation.

Set aside those characteristics for now. I want to ask and answer a different question. If the project leader doesn't know exactly what the software needs to do in complete detail, where does this knowledge come from?

I believe the only workable answer, at least for software that attempts to do new things, is "from a well-honed feedback cycle."

Some of my most effective debugging and design sessions have come from working with other people. In particular, I spent over a year working on a book with a close friend. We discussed and debated organizing principles and appropriate coverage and tone and approaches frequently. Of course we disagreed, but when we stood in his living room and rearranged sticky notes on the wall to divide topics into sections, or when we spread out index cards on my floor to separate principles by theme, we learned much more than when we produced individual plans in isolation.

You can predict what a book needs to contain and what its intended audience wants and needs (two very different verbs) to read at the start of a project. That's good business sense. Yet you must retain the flexibility to change your plan when you realize that it's not working right, or that it could work better.

The act of working in concert with another human being with different experiences and thoughts and biases and goals helped both of us refine our approach to cover what we needed to cover. I'd say "Let's use this approach," and he'd say "That's good, but what about this other idea you've inadvertently neglected?" and we'd refine our plan until it fit. We had the same high-level goals, but left to ourselves, the end result would have been far different. It was better for the rapid feedback.

We asked for feedback on draft chapters, and we received often voluminous responses. Some of our most proud ideas didn't hold up when exposed to our intended audience. If we'd persisted in our original plan, or if we'd eschewed rapid feedback in favor of the traditional model of user acceptance testing at the end of the project, our book would have suffered.

I believe software development works the same way, only more so.

The person with the vision for a software project must re-evaluate the plan against reality frequently. I believe that there is no better way to do so than to request feedback from the intended audience frequently.

That's one reason I encourage so strongly participation in community-developed software projects, and one reason I encourage so strongly the value (and even necessity) of frequent, small upgrades. Active participation brings your concerns to the attention of other participants. Frequent feedback helps minimize churn and rework, allows smaller course corrections, and establishes guideposts to mark whether an approach works or not.

I don't believe that this is the only way to develop modern software -- just that it is the most effective, especially considering the costs. (After all, what's cheaper: reeducating the world to upgrade their software more often, or changing the physical laws of the Universe to make change and risk irrelevant?)

Working with Test::Class Test Suites

By Ovid on March 20, 2009 1:52 PM | 2 Comments

In this series, I've explained how to use Test::Class in Perl, how to reuse Test::Class tests, and how to simplify Test::Class tests, and how to manage data dependencies and fixtures with Test::Class tests. If you've followed along -- and if you've written your own tests with Test::Class, you're on your way to becoming a testing expert. Now it's time to discuss some ancillary issues you may encounter.

Performance

With Test::Class::Load, you can run all of your test class tests in one process:

     use Test::Class::Load qw<path/to/tests>;

That loads the tests and all modules you're testing once. This can be a huge performance boost if you're loading "heavy" modules such as Catalyst or DBIx::Class. However, be aware that you're now loading all classes in a single process; there are potential drawbacks here. For example, if one of your classes alters a singleton or global variable that another class depends on, you may get unexpected results. Also, many classes load modules which globally alter Perl's behavior. Grep through your CPAN modules for UNIVERSAL:: or CORE::GLOBAL:: to see just how many classes do this.

Global state changes can introduce difficult-to-diagnose bugs. You will have to decide for yourself whether the benefits of Test::Class outweigh these drawbacks. My experience is that these bugs are usually very painful to resolve, but in finding them, I often find intermittant problems in my code bases that I could not have found any other way. For me, Test::Class offers many benefits, despite occasional frustrations.

People who prefer not to run all of their code in a single process often create separate "driver" tests:

     #!/usr/bin/env perl -T

     use Test::Person;
     Test::Class->runtests;

... and:

     #!/usr/bin/env perl -T

     use Test::Person::Employee;
     Test::Class->runtests;

Remember to omit the call to runtests if you've included this in your base class INIT.

Making Your Classes Behave Like `xUnit` Classes

In xUnit style tests, this is an entire test:

     sub first_name : Tests(tests => 3) {
         my $test   = shift;
         my $person = $test->class->new;

         can_ok $person, 'first_name';
         ok !defined $person->first_name,
           '... and first_name should start out undefined';

         $person->first_name('John');
         is $person->first_name, 'John', '... and setting its value should succeed';
     }

The TAP world considers this as three tests, but xUnit regards these three assertions as validations of a single feature, and thus one test. TAP-based tests have a long way to go before working for xUnit users, but there's one thing we can do. Suppose that you have a test with 30 asserts. The fourth assert fails. Many xUnit programmers argue that once an assert fails, the rest of the information in the test is unreliable. In that case, the test driver often halts. Regardless of whether you agree (I hate that JUnit requires the test method to stop), you can get this behavior with Test::Class. Use Test::Most instead of Test::More and put this in your test base class:

    BEGIN { $ENV{DIE_ON_FAIL} = 1 }

Because each test method in Test::Class is wrapped in an eval, that test method will stop running, the appropriate teardown method (if any) will execute and the tests will resume with the next test method.

I'm not a huge fan of this technique, but your mileage may vary.

Conclusion

While many projects work just fine using simple Test::More programs, larger projects can wind up with scalability problems. Test::Class gives you better opportunities for managing your tests, refactoring common code, and having your test code better mirror your production code.

Here's a quick summary of tips in this series:

Name your test classes consistently after the classes they're testing.
When possible, do the same for your test methods.
Don't use a constructor test named new.
Create your own Test::Class base class.
Abstract the the name of the class you're testing into a class method in your base class.
Name test control methods after their attribute.
Decide case-by-case whether to call a control method's parent method.
Don't put tests in your test control methods.

Acknowledgments

Thanks to Adrian Howard for creating the Test::Class module and providing me with tips in making it easier to use. Also, David Wheeler provided some useful comments, but that was on a first draft written years ago. I wonder if he remembers?

Using Test Control Methods with Test::Class

By Ovid on March 17, 2009 1:19 PM

Handling Startup/Setup/Teardown/Shutdown Methods

When you understand Organizing Test Suites with Test::Class, Reusing Test Code with Test::Class, and Removing Boilerplate Testing Code, you'll likely to discover that you need to have special code run at the start and end of a class and at the start and end of every test method. This code might connect to databases, delete temp files, or set up test fixtures.

Test::Class provides four test control methods to help:

startup

This method runs once for each class, before any tests run.

shutdown

This method runs once for each class, after all tests have run.

setup

This method runs before each test method.

teardown

This method runs after each test method.

`startup` and `shutdown`

One common function for the startup and shutdown methods is to set up and tear down a database:

     package Tests::My::Resultset::Customer;

     use base 'My::Test::Class';

     sub startup : Tests(startup) {
         my $test = shift;
         $test->_connect_to_database;
     }

     sub shutdown : Tests(shutdown) {
         my $test = shift;
         $test->_disconnect_from_database;
     }

     ...

When the test class loads, the first code which Test::Class runs is startup. At the end of the test, Test::Class calls the shutdown method is called and we disconnect from the database. Note that if the startup method has any tests and one fails, or if it throws any exception, the rest of the tests will not run. Any tests for parent classes will still run.

     sub startup : Tests(startup) {
         ok 0;   # the test class will abort here
     }

If this occurs, the shutdown method will not be called.

`setup` and `teardown`

It can also be useful to run code before and after every test method. Here's how:

     sub setup : Tests(setup) {
         my $test = shift;
         $test->_start_db_transaction;
     }

     sub check_priviledges : Tests(no_plan) {
         my $test = shift;
         $test->_load_priviledge_fixture;
         ...
     }

     sub teardown : Tests(teardown) {
         my $test = shift;
         $test->_rollback_db_transaction;
     }

This code starts a database transaction before every test method. The check_priviledges method loads its own test fixture and the teardown method rolls back the transaction, ensuring that the next test will have a pristine database. Note that if the setup method fails a test, the teardown method will still be called. This is different behavior for the startup method because Test::Class moves on to the next test and assumes you still want to continue.

Overriding test control methods

Users new to Test::Class often find that they run more test control methods than they expected or their test control methods run in an order they did not expect.

Controlling order of execution

Suppose that your test base class contains:

     sub connect_to_db : Tests(startup) {
         my $test = shift;
         $test->_connect_to_db;
     }

A test subclass contains:

     sub assert_db : Tests(startup => 1) {
         my $test = shift;
         ok $test->_is_connected_to_db, 'We still have a database connection';
     }

That will probably fail and your tests will not run. Why? Test::Class runs tests in alphabetical order in a test class. Because it includes inherited tests in your test class, you've inherited connect_to_db. As that sorts after assert_db, it runs after assert_db. Thus, you're asserting your database connection before you've connected.

The problem is tightly-coupled methods which rely on execution order. The fix is simple. Rename both startup methods to startup and have the child class call the super class method:

     sub startup : Tests(startup) {
         my $test = shift;
         $test->SUPER::startup;
         die unless $test->_is_connected_to_db, 'We still have a database connection';
     }

This works because Test::Class knows you've overridden the method and you can call it manually.

Warning: Note that the startup method now dies rather than running a test. Test::Class has no way of knowing if you're really going to call the super class. As a result, it has no way of knowing the real test count. The die halts the startup method.

Tip: for reasons mentioned above, don't put tests in your test control methods.

Controlling what gets executed

Suppose that you've a web page which provides additional features to authenticated users. You might test it with:

     sub unauthenticated_startup : Test(startup) {
         my $test = shift;
         $test->_connect_as_unauthenticated;
     }

In your "authenticated" subclass, you may have:

     sub authenticated_startup : Test(startup) {
         my $test = shift;
         $test->_connect_as_authenticated;
     }

Again, your tests will probably fail because authenticated_startup will run before unauthenticated_startup, and you have probably connected as the unauthenticated user in your "authenticated" subclass. However, this time you probably don't even need unauthenticated_startup to run. The solution is again to give the tests the same name without calling the parent's method:

     sub startup : Test(startup) {
         my $test = shift;
         $test->_connect_as_authenticated;
     }

Note that this control method does not run tests. If the connection fails, throw an exception.

The next and final article in this series explains how to manage test suites with Test::Class.

Making Your Testing Life Easier

By Ovid on March 12, 2009 2:00 PM | 2 Comments

After absorbing the information in Organizing Test Suites with Test::Class and Reusing Test Code with Test::Class, you're probably beginning to understand how Test::Class can make managing large codebases easier. If you've worked with test cases before, you've likely realized that test code is still code. Well-organized test code is easier to work with than poorly organized test code.

Auto-discovering your test classes

There's too much repetitive boilerplate in these tests. We can make them easier. The first problem is the helper script, t/run.t:

     #!/usr/bin/env perl -T

     use lib 't/tests';

     use Test::Person;
     use Test::Person::Employee;

     Test::Class->runtests;

Right now, this doesn't look so bad, but as you start to add more classes, this gets to be unwieldy. What if you forget to add a test class? Your class might be broken, but if the test class does not run, how will you know? Autodiscovering test classes helps:

     #!/usr/bin/env perl -T

     use Test::Class::Load qw<t/tests>;
     Test::Class->runtests;

Tell Test::Class::Load (bundled with Test::Class) which directories your test classes are in and it will find them for you. It does this by loading attempting to load all files with a .pm extension, so keep any helper test modules (which are not Test::Class tests) in a separate directory.

Using a common base class

Another useful technique of programming in general is to factor out common code. I've demonstrated this already, but there's room for improvement. Both test classes have a method for returning the name of the class being tested. It's possible to compute the name of this class, so why not push this into a base class? Add this to t/tests/My/Test/Class.pm:

     package My::Test::Class;

     use Test::Most;
     use base qw<Test::Class Class::Data::Inheritable>;

     BEGIN {
         __PACKAGE__->mk_classdata('class');
     }

     sub startup : Tests( startup => 1 ) {
         my $test = shift;
         ( my $class = ref $test ) =~ s/^Test:://;
         return ok 1, "$class loaded" if $class eq __PACKAGE__;
         use_ok $class or die;
         $test->class($class);
     }

     1;

In Person::Employee, delete the class method. In Person, delete the class and startup methods, and inherit from My::Test::Class instead of Test::Class. Now, class will always return the current class under testing. The new Test::Person class looks like:

     package Test::Person;

     use Test::Most;
     use base 'My::Test::Class';

     sub constructor : Tests(3) {
         my $test  = shift;
         my $class = $test->class;

         can_ok $class, 'new';
         ok my $person = $class->new, '... and the constructor should succeed';
         isa_ok $person, $class, '... and the object it returns';
     }

     sub first_name : Tests(3) {
         my $test   = shift;
         my $person = $test->class->new;

         can_ok $person, 'first_name';
         ok !defined $person->first_name,
           '... and first_name should start out undefined';

         $person->first_name('John');
         is $person->first_name, 'John', '... and setting its value should succeed';
     }

     sub last_name : Tests(3) {
         my $test   = shift;
         my $person = $test->class->new;

         can_ok $person, 'last_name';
         ok !defined $person->last_name,
           '... and last_name should start out undefined';

         $person->last_name('Public');
         is $person->last_name, 'Public', '... and setting its value should succeed';
     }

     sub full_name : Tests(4) {
         my $test   = shift;
         $test->_full_name_validation;

         my $person = $test->class->new(
             first_name => 'John',
             last_name  => 'Public',
         );

         is $person->full_name, 'John Public',
           '... and setting its value should succeed';
     }

     sub _full_name_validation {
         my ( $test, $person ) = @_;
         my $person = $test->class->new;
         can_ok $person, 'full_name';

         throws_ok { $person->full_name }
             qr/^Both first and last names must be set/,
             '... and full_name() should croak() if the either name is not set';

         $person->first_name('John');

         throws_ok { $person->full_name }
             qr/^Both first and last names must be set/,
             '... and full_name() should croak() if the either name is not set';
     }

     1;

The test results for Test::Person::Employee are:

     All tests successful.
     Files=1, Tests=32,  1 wallclock secs ( 0.33 cusr +  0.08 csys =  0.41 CPU)

There's an extra test, due to the ok 1 found in the My::Test::Class::startup method. It gets called an extra time for the loading of My::Test::Class.

Tip: If you must load your at BEGIN time, override this startup method in your test class -- but be sure to provide a class method.

Run individual test classes

When I develop tests, I hate to leave my editor merely to run tests from the command line. To avoid this, I a mapping in my .vimrc file similar to:

      noremap ,t :!prove --merge -lv %<CR>

When writing tests, I hit ,t and my test runs. However, doing this in a test class doesn't work. The class gets loaded, but the tests do not run. I could add a new mapping:

      noremap ,T  :!prove -lv --merge t/run.t<CR>

... but this runs all of my test classes. If I have several hundred tests, I don't want to hunt back through all of the test output to see which tests failed. Instead, I want to run a single test class. I altered my mapping to include the path to my test classes.

      noremap ,t  :!prove -lv --merge -It/tests %<CR>

I also removed the Test::Class->runtests line from t/run.t (or else I'll have my tests run twice if I run the full test suite). Because I use a common base class, I added a line to My::Test::Class:

      INIT { Test::Class->runtests }

Regardless of whether I'm in a standard Test::Most test program or one of my new test classes, I can type ,t and run only the tests in the file I'm editing.

If you run the tests for Test::Person::Employee, you'll see the full run of 32 tests because Test::Class will run the tests for the current class and all classes from which it inherits. If you run the tests for Test::Person, you'll only see 15 tests run -- the desired behavior.

If you prefer Emacs, add this to your ~/.emacs file:

     (eval-after-load "cperl-mode"
         '(add-hook 'cperl-mode-hook
             (lambda () (local-set-key "\C-ct" 'cperl-prove))))

     (defun cperl-prove ()
         "Run the current test."
         (interactive)
         (shell-command (concat "prove -lv --merge -It/tests "
             (shell-quote_argument (buffer-file-name)))))

That will bind this to C-c t and you can pretend that you're as cool as Vim users (just kidding! Stop the hate mail already).

Next time, learn to use test control methods with Test::Class.

Reusing Test Code with Test::Class

By Ovid on March 10, 2009 1:50 PM | 1 Comment

After reading Organizing Test Suites with Test::Class, you're probably and saying "that's a heck of a lot of work just for testing a class." If this were all there is to it, you'd be perfectly justified in forgetting about Test::Class. However, Test::Class really shines when it comes to code re-use. Consider writing a subclass of Person named Person::Employee. I'll keep it simple by only providing an employee_number method, but you'll quickly understand the benefits.

     package Person::Employee;

     use Moose;
     extends 'Person';

     has employee_number => ( is => 'rw', isa => 'Int' );

     1;

Here's its test class:

     package Test::Person::Employee;

     use Test::Most;
     use base 'Test::Person';

     sub class {'Person::Employee'}

     sub employee_number : Tests(3) {
         my $test     = shift;
         my $employee = $test->class->new;

         can_ok $employee, 'employee_number';
         ok !defined $employee->employee_number,
             '... and employee_number should not start out defined';

         $employee->employee_number(4);
         is $employee->employee_number, 4,
             '... but we should be able to set its value';
     }

     1;

Notice that instead of inheriting from Test::Class, the test inherits from Test::Person, just like Person::Employee class inherited from Person. Also, this overrides the class method to ensure that tests know which class they're using.

Remember to add Test::Person::Employee to t/run.t:

     #!/usr/bin/env perl -T

     use lib 't/tests';

     use Test::Person;
     use Test::Person::Employee;

     Test::Class->runtests;

    And when we run it t/run.t:

     All tests successful.
     Files=1, Tests=31,  1 wallclock secs ( 0.25 cusr +  0.06 csys =  0.31 CPU)

Whoa! Wait a minute. This new test class only had three tests. The previous run ran with 14, so how come the report says it ran 31?

Test::Person::Employee inherited the tests from Test::Person. The 14 original tests plus the 14 inherited tests and the 3 added tests add up to 31 tests! These aren't frivolous tests, either. Look at the new test's output:

     # Test::Person::Employee->constructor
     ok 16 - Person::Employee->can('new')
     ok 17 - ... and the constructor should succeed
     ok 18 - ... and the object it returns isa Person::Employee
     #
     # Test::Person::Employee->employee_number
     ok 19 - Person::Employee->can('employee_number')
     ok 20 - ... and employee_number should not start out defined
     ok 21 - ... but we should be able to set its value
     #
     # Test::Person::Employee->first_name
     ok 22 - Person::Employee->can('first_name')
     ok 23 - ... and first_name should start out undefined
     ok 24 - ... and setting its value should succeed
     #
     # Test::Person::Employee->full_name
     ok 25 - Person::Employee->can('full_name')
     ok 26 - ... and full_name() should croak() if the either name is not set
     ok 27 - ... and full_name() should croak() if the either name is not set
     ok 28 - ... and setting its value should succeed
     #
     # Test::Person::Employee->last_name
     ok 29 - Person::Employee->can('last_name')
     ok 30 - ... and last_name should start out undefined
     ok 31 - ... and setting its value should succeed

By not explicitly hard-coding the class name in the tests and because Test::Person::Employee had overridden the class method, these new tests run against instances of Person::Employee, not Person. This demonstrates that subclassing did not break any of the inherited behavior! However, if you do need to alter the behavior of one of those methods, as you might expect with object-oriented code, all you need to do is override the corresponding test method. For example, what if employees must have their full names listed in the format "last name, first name"?

     sub full_name {
         my $self = shift;

         unless ( $self->first_name && $self->last_name ) {
             Carp::croak("Both first and last names must be set");
         }

         return $self->last_name . ', ' . $self->first_name;
     }

The appropriate test method in Test::Person::Employee might look like:

     sub full_name : Tests(no_plan) {
         my $test   = shift;
         my $person = $test->class->new;
         can_ok $person, 'full_name';

         throws_ok { $person->full_name }
         qr/^Both first and last names must be set/,
           '... and full_name() should croak() if the either name is not set';

         $person->first_name('John');

         throws_ok { $person->full_name }
         qr/^Both first and last names must be set/,
           '... and full_name() should croak() if the either name is not set';

         $person->last_name('Public');
         is $person->full_name, 'Public, John',
           '... and setting its value should succeed';
     }

Make those changes and all tests will pass. Test::Person::Employee will call its own full_name test method and not that of its parent class.

Refactoring test classes

There's a lot of duplication in the full_name test which you should factor out into common code. The well-known (if poorly-practiced) aphorism that test code is just code is even more true when Test::Class. Well-factored tests are easier to understand, to maintain, and to modify than poorly-factored tests.

Refactoring with methods

One approach to reduce duplication in Test::Person class might be to create helper methods:

     sub full_name : Tests(no_plan)
         my $test   = shift;
         $test->_full_name_validation;

         my $person = $test->class->new(
             first_name => 'John',
             last_name  => 'Public',
         );

         is $person->full_name, 'John Public',
           'The name of a person should render correctly';
     }

     sub _full_name_validation {
         my ( $test, $person ) = @_;
         my $person            = $test->class->new;
         can_ok $person, 'full_name';

         throws_ok { $person->full_name }
             qr/^Both first and last names must be set/,
             '... and full_name() should croak() if the either name is not set';

         $person->first_name('John');

         throws_ok { $person->full_name }
             qr/^Both first and last names must be set/,
             '... and full_name() should croak() if the either name is not set';
     }

And in Test::Person::Employee:

     sub full_name : Tests(no_plan)
         my $test   = shift;
         $test->_full_name_validation;
         my $person = $test->class->new(
             first_name => 'Mary',
             last_name  => 'Jones',
         );
         is $person->full_name, 'Jones, Mary',
           'The employee name should render correctly';
     }

Just like with any other OO code, subclasses inherit and can override the _full_name_validation method.

Refactoring with fixtures

When writing test classes, the startup and shutdown methods are very handy, but those run only at the beginning and end of your test class. Sometimes you need code to run before the beginning and end of every test method. In the Person examples, many of the test methods contained this line:

     my $person = $test->class->new;

You really may not want to duplicate that every time, so you can use what's known as a fixture. A fixture is "fixed state" for your tests to run against. These allow you to remove duplicate setup code from your tests and to have a controlled environment. You might write:

     sub setup : Tests(setup) {
         my $test        = shift;
         my $class       = $test->class;
         $test->{person} = $class->new;
     }

If you want to start with a known set of data, you could write:

     sub setup : Tests(setup) {
         my $test        = shift;
         my $class       = $test->class;

         $test->{person} = $class->new(
             first_name => 'John',
             last_name  => 'Public',
         );
     }

Now all of your test methods can simply use $test->{person} (you can even make that a method if you prefer) to access a new instance of the class you're testing without having to duplicate that code.

The corresponding teardown method is useful if you need to clean up on a per test basis. This can be useful if you run tests against a database.

Next time, I'll discuss how to manage test classes with Test::Class.

Organizing Test Suites with Test::Class

By Ovid on March 9, 2009 6:18 PM | 3 Comments

When working with large test suites, using procedural tests for object-oriented code becomes clumsy after a while. This is where Test::Class really shines. Unfortunately, many programmers struggle to learn this module or don't use its full power.

Please note that article assumes a basic familiarity with object-oriented Perl and testing. Also, some of these classes are not "proper" by the standards of many OO programmers (your author included), but have been written for clarity rather than purity.

Modules and Their Versions

This article uses the following modules and versions:

Test::Class version 0.31
Test::Most version 0.21
Test::Harness version 3.15
Moose version 0.7
Class::Data::Inheritable version 0.08

You may use lower versions of these modules (and write the OO by hand instead of using Moose), but be aware that you may see slightly different behavior.

Notes about the code

Note that Moose packages should generally end with:

  __PACKAGE__->meta->make_immutable;
  no Moose;

I've omitted this from the examples. I've also omitted use strict and use warnings, but assume they are there (they're automatically used when you use Moose). The code will, however, run just fine without this. I did this merely to focus on the core features of the code in question.

Of course, you may need to adjust the shebang line (#!/usr/bin/env perl -T) for your system.

Evolution of a Perl Programmer

There are many paths programmers take in their development, but a typical one seems to be:

Start writing simple procedural programs.
Start writing modules to reuse code.
Start using objects for more powerful abstractions.
Start writing tests.

While it would be nice if people started writing tests from day 1, most programmers don't. When they do, they're often straight-forward procedural tests like:

     #!/usr/bin/env perl -T

     use strict;
     use warnings;

     use Test::More tests => 3;

     use_ok 'List::Util', 'sum' or die;

     ok defined &sum, 'sum() should be exported to our namespace';
     is sum(1,2,3), 6, '... and it should sum lists correctly';

There's nothing wrong with procedural tests. They're great for non-OO code. For most projects, they handle everything you need to do. If you download most modules off the CPAN you'll generally find their tests -- if they have them -- procedural in style. However, when you start to work with larger code bases, a t/ directory with 317 test scripts starts to get tedious. Where is the test you need? Trying to memorize all of your test names and grepping through your tests to find out which ones test the code you're working with becomes tedious. That's where Adrian Howard's Test::Class can help.

Using Test::Class

Creating a simple test class

I'm a huge "dive right in" fan, so I'll now skip a lot of the theory and show how things work. Though I often use test-driven development (TDD), I'll reverse the process here to show explicitly what I'm testing. Also, Test::Class has quite a number of different features, not all of which I'm going to explain here. See the documentation for more information.

First, create a very simple Person class. Because I don't like writing out simple methods over and over, I used Moose to automate a lot of the grunt work.

     package Person;

     use Moose;

     has first_name => ( is => 'rw', isa => 'Str' );
     has last_name  => ( is => 'rw', isa => 'Str' );

     sub full_name {
         my $self = shift;
         return $self->first_name . ' ' . $self->last_name;
     }

     1;

This provides a constructor and first_name, last_name, and full_name methods.

Now write a simple Test::Class program for it. The first bit of work is to find a place to put the tests. To avoid namespace collisions, choose your package name carefully. I like prepending my test classes with Test:: to ensure that we have no ambiguity. In this case, I've put my Test::Class tests in t/tests/ and named this first class Test::Person. Assume the directory structure:

     lib/
     lib/Person.pm
     t/
     t/tests/
     t/tests/Test
     t/tests/Test/Person.pm

The actual test class might start out like:

     package Test::Person;

     use Test::Most;
     use base 'Test::Class';

     sub class { 'Person' }

     sub startup : Tests(startup => 1) {
         my $test = shift;
         use_ok $test->class;
     }

     sub constructor : Tests(3) {
         my $test  = shift;
         my $class = $test->class;
         can_ok $class, 'new';
         ok my $person = $class->new,
             '... and the constructor should succeed';
         isa_ok $person, $class, '... and the object it returns';
     }

     1;

Note: this code uses Test::Most instead of Test::More to take advantage of Test::Most features later. Also, those methods should really be ro (read-only) because the code makes it possible to leave the object in an inconsistent state. This is part of what I meant about "proper" OO code, but again, I wrote this code for illustration purposes only.

Before I explain all of that, run this test. Add this program as t/run.t:

     #!/usr/bin/env perl -T

     use lib 't/tests';
     use Test::Person;

     Test::Class->runtests;

This little program sets the path to the test classes, loads them, and runs the tests. Now you can run that with the prove utility:

     $ prove -lv --merge t/run.t

Tip: The --merge tells prove to merge STDOUT and STDERR. This avoids synchronization problems that happen when STDERR is not always output in synchronization with STDOUT. Don't use this unless you're running your tests in verbose mode; it sends failure diagnostics to STDOUT. TAP::Harness discards STDOUT lines beginning with # unless running in verbose mode.

You will see output similar to:

     t/run.t ..
     1..4
     ok 1 - use Person;
     #
     # Test::Person->constructor
     ok 2 - Person->can('new')
     ok 3 - ... and the constructor should succeed
     ok 4 - ... and the object it returns isa Person
     ok
     All tests successful.
     Files=1, Tests=4,  0 wallclock secs ( 0.03 usr  0.00 sys +  0.43 cusr  0.02 csys =  0.48 CPU)
     Result: PASS

Note that the test output (named the "Test Anything Protocol", or "TAP", if you're curious) for the constructor method begins with the diagnostic line:

     # Test::Person->constructor

That occurs before every test method's output and makes it very easy to find which tests failed.

Look more closely at the test file to see what's happening:

     01: package Test::Person;
     02:
     03: use Test::Most;
     04: use base 'Test::Class';
     05:
     06: sub class { 'Person' }
     07:
     08: sub startup : Tests(startup => 1) {
     09:     my $test = shift;
     10:     use_ok $test->class;
     11: }
     12:
     13: sub constructor : Tests(3) {
     14:     my $test  = shift;
     15:     my $class = $test->class;
     16:     can_ok $class, 'new';
     17:     ok my $person = $class->new,
     18:         '... and the constructor should succeed';
     19:     isa_ok $person, $class, '... and the object it returns';
     20: }
     21:
     22: 1;

Lines 1 through 4 are straightforward. Line 4 makes this class inherit from Test::Class; and that's what makes all of this work. Line 6 defines a class method which the tests will use to know which class they're testing. It's very important to do this rather than hard-coding the class name in our test methods. That's good OO practice in general; it will help you later.

The startup method has an attribute, Tests with has the arguments startup and 1. Any method labeled as a startup method will run once before any of the other methods run. The 1 (one) in the attribute says "this method runs one test". If you don't run any tests in your startup method, omit this number:

     sub load_db : Tests(startup) {
         my $test = shift;
         $test->_create_database;
     }

     sub _create_database {
         ...
     }

Tip: as you can see from the code above, you don't need to name the startup method startup. I recommend you give it the same name as the attribute for reasons discussed later.

That will run once and only once for each test class. Because the _create_database method has no have any attributes, you may safely call it and Test::Class will not try to run it as a test.

Of course, there's a corresponding shutdown available:

     sub shutdown_db : Tests(shutdown) {
         my $test = shift;
         $test->_shutdown_database;
     }

These two attributes allow you to set up and tear down a pristine testing environment for every test class without worrying that other test classes will interfere with the current tests. Of course, this means that tests may not be able to run in parallel. Though there are ways around that, they're beyond the scope of this article.

As mentioned, the startup method has a second argument which tells Test::Class that it runs one test. This is strictly optional. Here we use it to safely test that we can load our Person class. As an added feature, if Test::Class detects that the startup test failed (or if it catches an exception), it assumes that there's no point in running the rest of the tests, so it skips the remaining tests for the class.

Tip: Don't run tests in your startup method; I'm doing so only to simplify this example. I'll explain why in a bit. For now, it's better to write:

     sub startup : Tests(startup) {
         my $test  = shift;
         my $class = $test->class;
         eval "use $class";
         die $@ if $@;
     }

Take a closer look at the constructor method.

     13: sub constructor : Tests(3) {
     14:     my $test  = shift;
     15:     my $class = $test->class;
     16:     can_ok $class, 'new';
     17:     ok my $person = $class->new,
     18:         '... and the constructor should succeed';
     19:     isa_ok $person, $class, '... and the object it returns';
     20: }

Tip: I did not name the constructor tests new because that's a Test::Class method and overriding it will cause the tests to break.

The Tests attribute lists the number of tests as 3. If you don't know how many tests you're going to have, use no_plan.

     sub constructor : Tests(no_plan) { ... }

As a short-cut, omitting arguments to the attribute will also mean no_plan:

     sub constructor : Tests { ... }

The my $test = shift line is equivalent to my $self = shift. I've like to rename $self to $test in my test classes, but that's merely a matter of personal preference. The $test object is an empty hashref. This allows you to stash data there, if needed. For example:

     sub startup : Tests(startup) {
         my $test     = shift;
         my $pid      = $test->_start_process
             or die "Could not start process: $?";

         $test->{pid} = $pid;
     }

     sub run : Tests(no_plan) {
         my $test    = shift;
         my $process = $test->_get_process($test->{pid});
         ...
     }

The rest of the test method is self-explanatory if you're familiar with Test::More.

The test class also had first_name, last_name, and full_name, so write those tests. When you're in "development mode", it's safe to leave these tests as no_plan, but don't forget to set the number of tests when you're done.

     sub first_name : Tests {
         my $test   = shift;
         my $person = $test->class->new;

         can_ok $person, 'first_name';
         ok !defined $person->first_name,
           '... and first_name should start out undefined';

         $person->first_name('John');
         is $person->first_name, 'John',
           '... and setting its value should succeed';
     }

     sub last_name : Tests {
         my $test   = shift;
         my $person = $test->class->new;

         can_ok $person, 'last_name';
         ok !defined $person->last_name,
           '... and last_name should start out undefined';

         $person->last_name('Public');
         is $person->last_name, 'Public',
           '... and setting its value should succeed';
     }

     sub full_name : Tests {
         my $test   = shift;
         my $person = $test->class->new;

         can_ok $person, 'full_name';
         ok !defined $person->full_name,
           '... and full_name should start out undefined';

         $person->first_name('John');
         $person->last_name('Public');

         is $person->full_name, 'John Public',
           '... and setting its value should succeed';
     }

Tip: when possible, name your test methods after the method they're testing. This makes finding them much easier. You can even write editor tools to automatically jump to them. Not all test methods will fit this pattern, but many will.

The first_name and last_name tests can probably have common elements factored out, but for now they're fine. Now see what happens when you run this (warnings omitted):

     t/run.t ..
     ok 1 - use Person;
     #
     # Test::Person->constructor
     ok 2 - Person->can('new')
     ok 3 - ... and the constructor should succeed
     ok 4 - ... and the object it returns isa Person
     #
     # Test::Person->first_name
     ok 5 - Person->can('first_name')
     ok 6 - ... and first_name should start out undefined
     ok 7 - ... and setting its value should succeed
     #
     # Test::Person->full_name
     ok 8 - Person->can('full_name')
     not ok 9 - ... and full_name should start out undefined

     #   Failed test '... and full_name should start out undefined'
     #   at t/tests/Test/Person.pm line 48.
     #   (in Test::Person->full_name)
     ok 10 - ... and setting its value should succeed
     #
     # Test::Person->last_name
     ok 11 - Person->can('last_name')
     ok 12 - ... and last_name should start out undefined
     ok 13 - ... and setting its value should succeed
     1..13
     # Looks like you failed 1 test of 13.
     Dubious, test returned 1 (wstat 256, 0x100)
     Failed 1/13 subtests

     Test Summary Report
     -------------------
     t/run.t (Wstat: 256 Tests: 13 Failed: 1)
       Failed test:  9
       Non-zero exit status: 1
     Files=1, Tests=13,  0 wallclock secs ( 0.03 usr  0.00 sys +  0.42 cusr  0.02 csys =  0.47 CPU)
     Result: FAIL

Uh oh. You can see that full_name isn't behaving the way the tests expect. Suppose that you want to croak if either the first or last name is not set. To keep this simple, assume that neither first_name nor last_name may be set to a false value.

     sub full_name {
         my $self = shift;

         unless ( $self->first_name && $self->last_name ) {
             Carp::croak("Both first and last names must be set");
         }

         return $self->first_name . ' ' . $self->last_name;
     }

That should be pretty clear. Look at the new test now. Use the throws_ok test from Test::Exception to test the Carp::croak(). Using Test::Most instead of Test::More makes this test function available without explicitly using Test::Exception.

 sub full_name : Tests(no_plan) {
     my $test   = shift;
     my $person = $test->class->new;
     can_ok $person, 'full_name';

     throws_ok { $person->full_name }
         qr/^Both first and last names must be set/,
         '... and full_name() should croak() if the either name is not set';

     $person->first_name('John');

     throws_ok { $person->full_name }
         qr/^Both first and last names must be set/,
         '... and full_name() should croak() if the either name is not set';

     $person->last_name('Public');
     is $person->full_name, 'John Public',
       '... and setting its value should succeed';
 }

Now all of the tests pass and you can go back and set the test plan numbers, if desired:

 All tests successful.
 Files=1, Tests=14,  0 wallclock secs ( 0.03 usr  0.00 sys +  0.47 cusr  0.02 csys =  0.52 CPU)
 Result: PASS

The next article, Reusing Test Code with Test::Class shows how to inherit from test classes -- and how to refactor test classes!

Reasons NOT to Upgrade

By chromatic on March 5, 2009 1:09 PM

In The Relentless Progression of Baby Steps, I suggested that frequent, minor upgrades are the sanest way to manage software dependencies. Chris Prather and I had a short discussion soon afterward; his Baby Steps to NASCAR gives one example where upgrading is not necessarily advisable.

There are other situations. I believe they're rarer than most people think, but software development has nuances related to risk and value tradeoffs. (The Internet, I believe, killed nuance and has very nearly replaced satire with lower primate chest-beating, but that's a different story.)

Risk and Value

I use Test-Driven Development for all code I intend to maintain. I can imagine a well-meaning but non-technical manager watching the process and wondering why I write one and a half to three times as much test code as I do code and asking me why I spend so much time moving around and deleting and revising code. "Wouldn't it be more efficient," he might ask, "just to write the code you intend to keep the first time?"

Set aside the question "How do I know what I need until I need it?" (though it's an important question) and consider some risks. By writing tests and aggressively refactoring, I risk spending more time creating code than if I wrote code itself. I may have more code to maintain; test code is still code, subject to rules of maintenance, clarity, coding standards, and duplication. If I'm going down the wrong path, I may enshrine the wrong behavior in the tests.

However, if I don't use the TDD cycle, I risk writing too much code that I don't need. I risk regressions, as my test coverage is likely poorer. I risk writing APIs that are difficult to use, because I've never used them. I risk having fewer refactoring possibilities in the future, because my code coverage may not be as good.

I risk those unpredictably long debugging sessions where I know there's a bug somewhere, but I can't pin it down to a three-to-five section of code I wrote in the past five minutes. (Sometimes you do get these long debugging sessions even with TDD, but they're much less frequent and much less painful.)

Throwing use Modern::Perl; at the top of my programs takes a little bit of effort too - especially as it's one more dependency for people to install -- but allowing Perl to warn me if I've made a typo or a common logic error is a huge benefit for me. I don't make those mistakes very often, but when I do, they're much easier to find and to understand if the computer can point them out to me. My mind's usually busy thinking on a much higher level than "Oh, you transposed two letters in a variable name".

For a small cost, I get tremendous benefits. It's important to look at risk and reward in those terms.

Risks and Rewards of Frequent Upgrades

With that in mind, here are the situations where I believe it may be appropraite to skip minor upgrades. I won't analyze all of the risks and rewards, as they're general categories and not specific situations. You may safely not upgrade a dependency when:

... your software is not receiving maintenance. This is the "If it's not broken, don't touch it" scenario. Please note that this means that no one is using the application, it's running on a box in a corner somewhere that no one touches it, and the moment someone reads the source code out of anything other than morbid curiousity, you're maintaining it. It's stable, by which I mean it's never ever going to change. It's dead.
... you've paid a vendor or support organization lots of money not to upgrade. See: IBM, Sun, SAP, Oracle. (If I paid someone lots of money, I'd expect them to remove not cause headaches, but that's just me.)
... you depend solely on vendor packages and subsequent updates. If you want to stick with what Debian unstable provides for mod_perl and use only .deb-packaged CPAN modules, you do have some advantages. (Be aware of the disadvantages -- but the advantages are compelling.)
... you can and will support the dependency yourself. If you want to use Perl 5.005, you can find it. If you can get it compiling on a new box purchased in March 2009 running a modern operating system and compiler, great. You're on your own for security patches, bug fixes, and (likely) porting other code to work on that old version. You may have to backport patches yourself, or pay someone to do it. You have that option.

That's it.

You may have noticed the conspicuous lack of "Can't the free software community support old versions for free?" That was deliberate, and the reasoning should be obvious.

The Relentless Progression of Baby Steps

By chromatic on March 4, 2009 11:38 AM | 6 Comments | 1 TrackBack

I like to cook. I hate washing dishes. Sometimes I let dishes pile up in my sink and on my countertop. When I stagger downstairs for breakfast, sometimes I have no clean dishes, and I have to rearrange the dirty dishes in the sink to find dishes to wash and to make room to wash them.

Sometimes I'm good, and I put my dirty dishes in the dishwasher immediately. My sink's always empty, so I can use it if necessary, and my cupboards are full of clean dishes and pots and pans and other cooking utensils. I don't have to set aside extra time to get my kitchen in shape before I can cook.

A few of my friends run marathons. They train every day for several months, though usually not the full twenty-six-and-a-half miles on every run. They don't show up on race day having run not at all in the past year, expecting to finish. They put in a little bit of work every day to prepare.

I write novels as a hobby. My goal is to write a thousand words every day. That takes somewhere between half an hour and an hour and a half every day. In a year, I can write two novels at this pace. It's possible to write a novel in a month, but that's some three thousand words every day. A professional writer with a great outline, a good sense of character, and the discipline that comes from months or years of writing every day could make that work, but not so anyone else.

When I write code, I run my tests every time I make a change to the code I'm testing. Sometimes that's every few seconds. I could run my tests only when preparing to release a new version, but every time I've done that, I've spent hours debugging problems that would have taken me seconds to fix if I could trace them down to the single line of code I changed since my last successful test run.

In all of these endeavors, one step at a time adds up to big progress over time. Writing one word isn't difficult. Washing one dish is easy. Fixing one failing test is trivial. Running one mile is much less daunting than running twenty six. This isn't a new thought. It's barely an interesting thought.

I wonder, though -- what is so different about upgrading to a new release of software every month that's so difficult the only way to do it is in a long-delayed, long-planned, big thud upgrade every two or three or five or ten years?

It's not as if I wash any fewer dishes one at a time than all at once, or run any fewer steps one mile at a time than twenty six at a time, or write any fewer words a thousand at a time than a hundred thousand at a time, or write any less code two and three lines at a time than two or three hundred at a time. The amount of work accomplished tends to scale linearly. Maybe upgrades seem daunting and weird and difficult because big bang all at once once in a blue moon upgrades are simply too big to be anything other than difficult. Maybe small, frequent upgrades can be so boring and dull that they're as easy as putting your spoon in the dishwasher after you finish eating.

Certainly with Parrot we've discovered that releasing a new stable version every month is much easier than releasing a new stable version only once or twice a year. It's documented. It's automated. It's repeatable. It takes some time, but it's fairly boring how uncomplicated it is.

What makes the other side of the process -- users upgrading -- any more difficult?

Turning Baby Perl into Grownup Perl

By chromatic on March 2, 2009 11:34 AM | 1 Comment

Larry Wall has a quote somewhere that it's okay to speak baby Perl. After all, you don't look down on a baby for occasionally misconjugating an irregular verb. (Of course, I still can't reliably decline Spanish nouns by gender, and I haven't been a baby for a while.)

Baby Perl and bad Perl may have a lot in common, but the differences are very important. The vital distinction is how each gets written.

Baby Perl avoids well-known Perl idioms. It may have the flavor of another language: it's easy to see the influence of C or PHP or Visual Basic in a Perl program. Experienced programmers can look at the code and say "That's clunky, but it can work."

Bad Perl ignores well-known Perl design and coding practices. The code is poorly-factored. Variable names are obtuse. There's no error checking. Random bits of dead code pop up, sometimes commented out and sometimes live. Everywhere you look you see bad code copied and pasted from known sources of bad code. Global variables appear and disappear. (This is how you write unmaintainable code in any language -- that includes Python, Ruby, and Haskell (except for the global variables).)

Baby Perl can be bad Perl, and bad Perl can be baby Perl, but the difference is in design.

A baby Perl program written as a didactic exercise is fine. A baby Perl program written to run once, then never again, is fine. A baby Perl program written and published on the Internet to live forever in search engine results and caches is fine -- if it's clearly labeled as baby Perl. That's one of my gripes in The Opposite of Modern.

Baby Perl that persists longer than five minutes, that appears as an example of how to use Perl to solve a problem, or that grows into vital code more than a hundred lines long will quickly become bad Perl.

Baby Perl doesn't scale with programmer effort. Perl novices have enough trouble understanding context and how hashes work without worrying about the Law of Demeter, the Single Responsibility Principle, pass-by-value versus pass-by-reference, the Liskov Substitution Principle, and time/space tradeoffs. I had to learn whether elsif contains the second e or is two keywords before worrying about (manual) tail call optimization in Perl.

However.

The Perl compiler can provide tremendous amounts of debugging information, if you know the magic incantations to enable it (see Toward a Modern::Perl). Perl::Critic can analyze a Perl program and report unidiomatic, dangerous, or sloppy uses. (There should be a Perl::Critic for baby Perl. "These variable names have trailing numbers. Have you considered using an array? See perldoc foo for advice.") The built-in perldiag documentation explains what went wrong and how to fix it -- if you've enabled warnings.

In other words, Perl is willing to help novices writing baby Perl only after they've learned how to ask for help.

Fixing that -- making Perl 5.12 modern by default, with backwards compatibility available through use of a pragma -- will help Perl help novices to write better code. That's step one.

As step two, this change immediately breaks all existing Bad Perl code in the world which won't even compile under modern Perl. Good. There is a slight danger in that everyone who's written these examples will update them with the magic incantation to disable all of this help, but at least to make that work people will have to upgrade to a modern Perl release, which isn't so awful in itself.

Step three is to replace all of those bad tutorials and bad examples and bad idiom worms just waiting to propogate through copy and paste and blog with good code -- code that actually works under modern Perl.

I realize that certain people will object. "I've used Perl for fifteen years, and code I wrote back in 1994 still works unmodified on bleadperl today." That's great. Good for you, unless you're still writing baby Perl after fifteen years of experience. If you're not -- if you're the television equivalent who can drive a race car at 300 miles per hour, crash into a wall, catch on fire, and walk away uninjured thanks to a breakaway harness -- then you already know enough to keep your existing code running unmodified. (Hint: PERL5LIB.) Don't tell me that the sophomores at the high school down the street can get by without seatbelts, airbags, and passenger-side emergency brakes in the Drivers Training cars, however.

« February 2009 | Main Index | Archives | April 2009 »