July 2013 Archives

Miyagawa's Carton dependency tracking system for CPAN modules is about to reach its 1.0 release, so I've been exploring it for a client project.

The example project has five or six developers, all working remotely. We collaborate from a single Git repository. We have one testing server, one live server, and one mail server, though we're going to add at least one separate database server in the very near future. Our deployment strategy is minimal; we've automated part of it so that we can roll back commits that have problems, but we haven't automated database changes or system administration changes (adding an SSL certificate, changing the IP address of the database, installing a necessary Debian package, et cetera).

We do have a semi-structured list of deltas for CPAN module installations, but the process of keeping things up to date still requires manual intervention. Because tracking new dependencies and updated dependencies requires manual intervention (for example, a change to JSON rendering in Mojolicious 4 necessitated upgrading Mojolicious everywhere), Carton seemed like a good fit.

Carton does a couple of things. It keeps track of the dependencies of a project and their version numbers, given a list of dependencies, and lets you install those specific dependencies on any machine which has Carton.

In other words, if I somehow tell Carton that my project depends on Test::Most and Math::BaseConvert, Carton will write out a file it knows how to understand such that anyone who checks out the repository can use carton install to install the exact version of those modules I have installed.

(Carton can also bundle dependencies into a cache directory and install them from that cache, but that's not what we're doing.)

To start using Carton, install it. Then create a cpanfile with your dependencies:

require 'Test::Most';
require 'Math::BaseConvert';

With this file in place, run carton install. This will install those modules, if necessary, and write a file called cpanfile.snapshot with dependency information. (See Module::CPANfile for more information about cpanfile.)

Because Carton intends to manage dependencies for an application, it wants to install its modules to local/lib/perl5/ beneath the current directory. This may not work with your application; you may prefer to set the PERL_CARTON_PATH environment variable to point elsewhere. Keep in mind that wherever you have Carton install these dependencies, your application needs to have that directory in its include path somehow. (See Carton::Environment for more information.)

You can safely exclude that directory from your source control, but you should include both cpanfile and cpanfile.snapshot so that other people can work with Carton from your VCS checkout.

Adding a dependency is as easy as editing cpanfile and running carton install. Similarly, satisfying dependencies on a new checkout is as easy as running carton install.

For various reasons, our client project has a Makefile, so it's likely we'll add a simple target to add dependencies (and stage them for the next git commit) and to update dependencies. This, in fact, is one of the benefits of the design of Carton. Even though I only found an environment variable for customizing the library installation path, it's almost trivial to use the modules which make up Carton to build an application-specific installer—whereby we can keep our cpanfile and cpanfile.snapshot files in a directory of our choosing.

There's not much else to Carton I need beyond what I've described here. It doesn't intend to produce a full CPAN repository as tools like DarkPAN or Pinto do, which is fine. All we need is a little more discipline on how we manage dependencies, and Carton gives us an effective way to do that.

Good Tests Hate Ambiguity

Perhaps the most useful distinction between programmers who end up as architecture astronauts and working programmers who continue to build useful software is a streak of pragmatism. While maintainability is a primary concern of maintaining software, it's not the only reason someone might invest in developing software, for example.

This lesson is more subtle than it seems. You can see this in the decade-plus argument over the extreme programming and agile software development ideas, where an idealist may say "Why would we do something like fill in the blank?" and a pragmatist might respond "You don't have to do it, but we found that it worked pretty well for us," and the idealist hears that there's no grand unifying structure by which you can follow a checklist and get great software.

I like that paradox. I like contradictions, at least until I have to explain to a client that, while adding a feature is important, adding it correctly enough is also important. I've been burned a few times on trying to beat deadlines by skimping on quality of one form or another, and I've been burned a few times by spending too much time on things that just don't really matter for the sake of some arbitrary ideal.

I try to analyze technical decisions in terms of desired value compared to perceived cost. For example, it would be technically correct and useful to have a program that's mathematically proveable to terminate and to give the correct answer for every class of input, but it's practically infeasible to do so for all but the most trivial programs (which provide little practical value to prove).

Similarly, I've removed entire test cases from test suites which verified trivial and uninteresting and useless information about the test files (usually about metadata) because they were expensive to run and to maintain and actually hindered us from fixing the real problem in a better place.

Tests can be expensive to write and expensive to maintain. Poorly written tests can be fragile and misleading. Tests with hidden dependencies or assumptions or intermittent failures can cost you a lot of debugging time and anguish.

That's the potential risk. The potential reward is that you get more confidence that your software behaves as you intend and that it will continue to do so, as long as you pay the testing tax.

Good tests hate ambiguity. (Good code hates ambiguity.)

For example, a bad test which exercises passing invalid data to a web action might use a regex to parse HTML for the absence of a success message. A better test might use CSS or DOM selectors to verify the presence or absence of a single indicator that the request succeeded or failed.

To me, that specificity is the most important thing. It's not "How few tests can I write to get 100% coverage?" because writing as few tests as possible isn't my goal. Neither is my goal "all tests should look the same" nor "how quickly can I get through these tests". My goal is to write the minimum test code necessary both to prove that the feature I'm testing works the way I intend and to allow me to debug any failures now or in the future.

There's that mix of pragmatism and perfection again. I want to avoid false positives, so I'm confident that the test tests the specific behavior I want it to test, but I also want to avoid false negatives, so that meaningless changes (the order of CSS classes applied to a <div> element in HTML changed) don't cause test failures that I don't care about.

Good tests avoid ambiguity as far as possible and embrace specificity where sustainable. It's a design principle I try to keep in mind wherever I write tests. (I find that TDD helps, because it encourages that kind of testability, but I've also found that every month and year of experience I get helps even more.)

Suppose you need to perform a task. That task may take a lot of memory or a lot of time. For whatever reason it's too expensive right now. You need to do it later, in another process, when and where you have resources available.

(In context, today this was "extract data from a spreadsheet uploaded to a web application and create models, then save them to the database".)

You've probably done this before. Many of you have probably come up with a solution to this before. That's fine; I'm writing here to people new to Perl or other dynamic languages who haven't yet seen the joys of dynamic dispatch.

Suppose you have a generic Task class which somehow knows how to serialize its arguments to a persistent data storage mechanism, such as a database or a queue of some kind. If all of the asynchronous tasks you want to run conform to a simple interface, you can model any task you want to run with four attributes:

package Task {
    use Moose;
    use Class::Load 'load_class';

    has [qw( class_name method_name )],       is => 'ro', required => 1;
    has [qw( constructor_args method_args )], is => 'ro', default => sub { {} };

    sub execute_task {
        my $self  = shift;
        my $class = $self->class_name;

        load_class( $class );
        my $entity = $class_name->new( $self->constructor_args );

        my $method = $self->method_name;
        $entity->$method( $self->method_args );

Thus, to create a task which knows how to import ice cream flavors from a spreadsheet, you might write:

my $task = Task->new(
    class_name  => 'IceCream::Flavor',
    method_name => 'import_from_spreadsheet',
    method_args => { filename => $some_filename },

Then save or queue the task as necessary. When you want to run the task, dequeue or instantiate it from storage, then run its execute_task() method.

It's not much code. You could have invented it. (Most of the programmers I know have already invented something like this.) Yet for people new to dynamic languages and dynamic dispatch, this is one of the cases where the performance penalty for dynamic dispatch is worth it, simply because of all of the code I didn't have to write.

A couple of weeks ago I had to revise the data model in a client application. (Rant: when did the word "refactoring" get watered down into "changing code"? Refactoring used to mean changing the design of code without changing its behavior or external interface. If tests break, you did something wrong.) Unfortunately, this data model was at the core of one important piece of the system, and a lot of tests broke.

This was good, in the sense that we had tests for the entire feature and that they gave us a lot of coverage that we had it right.

This was bad, in that digging through a lot of failing test output is unpleasant work where you fix one thing and hope it fixes a lot of other things.

I've been working with Ovid again on this project, and one of the first things he brought up as a technical matter is the use of Test::Most over plain Test::More and the grab bag of other test modules that eventually creep in. Sure, he wrote Test::Most and has a lot of Vim macros and development habits to make his work easier, so it's not a burden to switch the code over gradually.

I'd never taken to Test::Most, mostly because I never took the time to use it seriously.

The other day, when facing a huge pile of failing tests I needed to make pass again, I decided to try it for one reason: BAIL_ON_FAIL. When you enable that feature of Test::Most, the first failing test will abort the test run. The only failure output you get is that first failing test.

I normally don't want this, because I want the whole gory details of my failure if I break something, but with the stepwise refinement necessary to finish my data model changes, it was just what I needed.

While I was editing the test files, I changed use Test::More; to use Test::Most;. That's easy enough. I didn't want to enable BAIL_ON_FAIL unilaterally because I'm not sure I want its behavior all the time. That's fine; you can run your tests with the environment variable set:

$ BAIL_ON_FAIL=1 prove -l t/web/ui_tests/some_feature.t

The responsibility for enabling this behavior is yours; it's not internal to the test file.

What if you take that one step further? What if you don't necessarily want to use Test::Most inside your test file yet? You can still get the BAIL_ON_FAIL behavior if you load Test::Most through prove:

$ BAIL_ON_FAIL=1 prove -MTest::Most t/web/ui_tests/some_feature.t

Even though the test file itself has no knowledge of Test::Most, you still get its ability to bail out of the test file at the first failure. (You do currently get a warning about a plan mismatch, but for me that's easy enough to overlook, as the rest of the output is much easier to use to diagnose failures.)

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide



About this Archive

This page is an archive of entries from July 2013 listed from newest to oldest.

June 2013 is the previous archive.

August 2013 is the next archive.

Find recent content on the main index or look in the archives to find all content.

Powered by the Perl programming language

what is programming?