April 2012 Archives

Picking Functional Programming's Pockets

By chromatic on April 30, 2012 12:49 PM | 2 Comments

In all of the debates over whether pair programming is exclusively 100% good or exclusively 100% evil or whether test-driven design is exclusively 100% beneficial or exclusively 100% silly, people sometimes miss the nuances of the polemic "if it's hard to test, it's hard to use".

In practice, that means that good programmers with good taste built from painful experiences have the ability to write better code if they exercise good taste when building tests.

(I know, this is the Internet of 21st century culture; the law of the excluded middle suggests that nuance, like irony, is deader than 19th century utopian cults. Doesn't mean that 10,000 volts of CPR are always wasted.)

That's what I had in mind when I wrote Mock Objects Despoil Your Tests. (See also Martin Fowler's Mocks Aren't Stubs.)

The more gyrations your code has to undergo before you're confident that it does what you intend it to do, no more and no less, the less confidence you have overall. In highfalutin' architecture astronaut terms, the more tightly coupled your tests are to the internals of your code, the worse your tests are. They could be fragile. They could make too many assumptions. They could be exercising things that no real code would exercise. They could be hard to write and overspecific.

In short, the likelihood that you've built yourself a maintenance burden is higher when you know far too much about the internals of a thing outside of that thing, even if the thing on the outside is a test intended to give you confidence.

(That's why I distrust putting code and tests in the same file, thank you very much Java. It's too tempting to cheat when the clear lines of demarcation aren't there.)

I only realized what I've been doing lately when I read Buddy Burden's Lazy == Cache?. He describes Moose lazy attributes the way I see them: as a promise to provide information when you need it. That laziness is a hallmark of Haskell. If you take laziness as far as Haskell does, you can build amazing things where things just happen when you need them.

Haskell, of course, goes a long way to encourage you to write programs in a pure style, where functions don't have side effects. Data comes into a function and data goes out, and the state of the world stays unchanged. Sure, you can't write any interesting program without at least performing IO, but Haskell encourages you to embrace purity as much as possible such that you minimize the places you update global state.

In my recent code, this has also just sort of happened, even in that code which isn't Haskell.

Consider an application which tracks daily stock market information, such as price and market capitalization. Each stock is a row in a table modeled by DBIx::Class. Each stock has an associated state, like "fetch daily price" or "write yearly free cash graph" or "invalid name; review".

No one would fault you for updating the stock price, market cap, and state on a successful fetch from the web service which provides this information. That's exactly what I used to do.

Now I don't.

I've separated the fetching of data from the parsing of data from the updating of data. Fetching and updating are solved problems; they happen at the boundaries of my code and I can only control so much about them. Either the database works or it doesn't. Either the remote web service is up or it isn't. (I still test them, but I've isolated them as much as possible.)

The interesting thing is always in the parsing and analysis. This is where all of the assumptions appear. (Is Berkshire Hathaway's A class BRK.a or BRK-A or something else? Are abbreviations acceptable in sector and industry classifications?) This is where I want to focus my testing—even my ad hoc testing, when I've found an assumption but need to research what's gone wrong and why before I can formalize my solution in test cases and code.

This means, the daily analysis method looks something like:

sub analyze_daily
{
    my ($self, $stock, $updates) = @_;
    my $stats                    = $self->get_daily_stats_for( $stock->symbol );

    return unless $stats->{current_price};
    $updates->{current_price} = $stats->{current_price};

    return unless $stats->{market_capitalization};
    $updates->{market_capitalization} = $stats->{market_capitalization};

    $updates->{PK} = $stock->symbol;
    return 1;
}

Any code that wants to test this can pass in a hash reference for $updates and a stock object (or equivalent) in $stock and test that the results are sane by exploring the hash reference directly, rather than poking around in $stock.

(The data fetcher itself uses dependency injection and fixture data so that all expected values are known values and that network errors or transient failures don't affect this test; obviously other tests must verify that the remote API behaves as expected. While I could make $stats a parameter here, I haven't had the need to go that far yet. There's a point beyond removing dependencies from inside a discrete unit of code makes little sense.)

This code is also much more reusable; it's trivial to create a bin/ or script/ directory full of little utilities which use the same API as the tests and help me debug or clean up or inspect all of this wonderful data.

Better yet, I find myself needing fewer tests, because each unit under test does less and has fewer loops and conditionals and edge cases. The problem becomes "What's the right fixture data to exercise the interesting behavior of this code?" My tests care less about managing the state of the objects and entities under test than they do about the transformations of data.

Perhaps it's not so strange that that's exactly what my programs care about too.

Make a DBIC Schema from DDL

By chromatic on April 27, 2012 1:05 PM

For some reason, creating DBIx::Class schemas by hand has never made sense to me. I like to write my CREATE TABLE statements instead. DBIx::Class::Schema::Loader works really well for this.

I keep this schema DDL in version control. I also keep a SQLite database around with some test data (but the database isn't in version control).

I usually find myself writing a little shell script or other program to to regenerate the DBIC schema from that test database. That usually requires me to make manual changes to the test database representing the changes I've just made to the DDL.

After doing this one too many times, I decided to combine DBIx::RunSQL with the schema loader. By creating a SQLite database from my DDL in memory, I can create a schema without me modifying any databases manually.

This was easier than I thought:

#!/usr/bin/env perl

use Modern::Perl;

use DBIx::RunSQL;
use DBIx::Class::Schema::Loader 'make_schema_at';

my $test_dbh = DBIx::RunSQL->create(
    dsn     => 'dbi:SQLite:dbname=:memory:',
    sql     => 'db/schema.sql',
    force   => 1,
    verbose => 1,
);

make_schema_at( 'MyApp::Schema',
    {
        components => [ 'InflateColumn::DateTime', 'TimeStamp' ],
        debug => 1,
        dump_directory => './lib' ,
    },
    [ sub { $test_dbh }, {} ]
);

The next step is to connect everything to DBIx::Class::Migration—but first things first.

Embrace the Little Conveniences

By chromatic on April 25, 2012 11:55 AM | 5 Comments

When Perl 6 introduced say (like print, but appends a newline) I had some skepticism.

Yes, the Modern::Perl module was as much a polemic as it was a convenience.

I know File::Slurp exists, but my fingers by now know how to read from a file in a single line of (impenetrable to the uninitiated) code:

my $text = do { local (@ARGV, $/) = $file; <> };

... and in each case, my initial feeling of "Why bother? What does that offer? How silly!" were wrong. In every one of these cases, the ability to write (and the requirement to read) less code has made my code better.

With say I don't have to worry about single- versus double-quotes, or even quoting at all sometimes. With use Modern::Perl;, I don't have to worry about enabling various features and pragmas. With File::Slurp, all I have to care about when reading from a file is typing read_file( $path ).

None of these are big deals on their own, but they're little details I don't have to worry about anymore. The same principle which says that Proc::Fork is easier to manage than writing your own forking code (I've written far too much of my own forking code) applies.

Sometimes getting the little nuisances out of the way makes me more productive and ready to tackle the big nuisances. Maybe saving my brainpower for complicated problems (what's the standard deviation from a least square fit?) is a better approach to typing my own read_file() function on every project.

As silly as it once seemed to use a CPAN module for a one liner, I've realized that not reusing good code is even sillier.

Fund Elbow Grease, not Birthday Cake

By chromatic on April 23, 2012 6:00 AM

My first rule of community-driven software development: volunteers will work on what volunteers want to work on.

My second rule of community-driven software development: volunteers are not fungible (see rule #1).

My third rule of community-driven software development: things that aren't fun tend not to get done.

My first rule of birthdays: cake is fun.

I've been reading an ongoing thread on a mailing list about funding the development and infrastructure of free and open source software projects. If you've read more than one thread like this, you know the discussion already. While it's possible to get a job doing what you love, most of us don't get paid to write software. We get paid to solve problems. Many of us are fortunate enough to be able to use and contribute back to free software, but few of us solely write free software.

That's probably okay. Most of the software I write for my businesses isn't that interesting to any one else anyhow.

Then you get to the idea that some pieces of software that serve as community underpinnings—the infrastructure plumbing that keeps the world humming—are so important that they deserve funded developers to ensure that things just work. From there you set up foundations and boards and run pledge drives and give out grants and, if you're lucky, sponsor a couple of developers to work on the software all the time.

The Apache Software Foundation has done this. So has the Linux Foundation.

(Even though many of the rest of us work on projects no less essential to the global software ecosystem, we're not that fortunate.)

I've been retraining myself to think like a businessman at least half of the time. Business, done well, addresses the problems of managing limited resources for the purpose of producing revenue. In programmer speak, I try to solve the most pressing problem in the most effective way to deliver working software as soon as feasible.

One of the hardest parts of running a small business is knowing when to pay someone else to do something you could do yourself. On the publishing side, I'm glad we did; we've paid people to do editing and design covers and validate electronic format conversions. I could do all of that, but that's a terrible waste of my time.

It's also not fun, and I'd keep putting it off—and that there is the hook.

Consider TPF's grants. The successful grants, the ones with real deliverables and real benefit, are those which wouldn't get done without the lubrication of money. While people like Nick Clark and Dave Mitchell (to name two names but not to diminish the hard work of many other people) have the expertise and desire to fix hard bugs in Perl 5, only the generous grants of tens of thousands of dollars free them to spend the time they need to look into these bugs and fix them.

After all, if it takes 40 hours to fix a bug in the regular expression engine or the interaction between string eval and closures, how much time can you realistically expect Dave or Nick to spend between working a day job and having some semblance of a social life apart from a computer?

If this were sufficiently fun (for whatever definition matters most) or easy (even if only in the sense that "I can debug this in five minutes and spend the next 55 polishing the solution for immediate integration!" is easier than "After 20 hours of diagnosis, I'm starting to get a handle on how things work. Now comes the hard part!"), it would have already happened.

Volunteers tend to do the fun things. That's adding features. That's reindenting code. Sometimes that's fixing easy bugs. That's rarely updating a web page or writing copious documentation or performing system administration or bisecting errors or setting up a huge test cluster. (All of those unfun things happen, occasionally. That "occasionally" proves that they're not fun. If they were fun, they'd happen more often.)

Volunteers tend to do the fun things. That's rarely maintaining code over a long period of time. (If you've solved your problem and moved on, what's your impetus to solve the problems of other people? Noble obligation? A sense of pride? Shame? Boredom?)

(This suggests that the way the Perl community manages Google Summer of Code projects is risky, at least if the goal is shipping working software that will survive even only until next summer.)

This all suggests to me that the best way to think of limited funding for community-driven software is leverage. It's elbow grease. It's hiring mechanics in dirty overalls to work hard and take things apart and put them back together and to get a thousand little details right. It has to be a little unglamorous and it has to be very, very focused on shipping real software and keeping it working in the hands of real users.

Sometimes, yes, funding is the best way to get something done sooner than it would be without funding. Money buys attention and time, of course. Yet if we apply money to get the fun things done—to buy birthday cake instead of elbow grease—we're only hurting ourselves.

Dependencies, Minimizers, and Regressing to JavaScript

By chromatic on April 20, 2012 11:29 AM | 5 Comments

JavaScript is Perl 4 with first class functions, slightly better lexicals, better implementations, and more users.

(If that hasn't offended you yet, note that that sentence doesn't include "a better type system", on purpose.)

While you can do some amazing things with modern JavaScript (see also ClubCompy, a retro-style programming environment designed for kids of all ages, for which we have a compiler and interpreter written in JavaScript), its flaws of language and ecosystem are obvious. The latter are obviously products of its environment.

Consider: you don't have anything like the CPAN for JavaScript in the browser. (Yes, I'm aware of NPM. No, it doesn't count. The point of client-side JavaScript delivered from a web page is that you don't have to have anything other than a web browser installed.)

Consider: this means you either make lots of requests for your dependent libraries (jQuery, any plugins you use, the JavaScript you've written), which is good in that if you use these libraries unmodified and load them from a public CDN, there's a chance some cache in the middle will already have them cached, but you still pay the network penalty for loading all of those libraries n at a time or you glue them all together on the server side somehow and send the client only one thing, except that it's only cached for your site.

Also, if you find a bug in a dependency, you get to regenerate that big blob of code. (If you don't find that bug, you get to live with it.)

I think about these things when I see a big lump of code stuffed into YAML.pm. Because we've left 1994 behind in the Perl world, we're able to take advantage of an amazing library distribution and dependency management system in the CPAN, where installing dependencies (and knowing they pass their tests) is so well understood that it's an exceptional condition when it doesn't work. In the past couple of years, installations have become so easy thanks to newer tools like perlbrew and cpanm that (if you're in the know) it's easier to manage code this way than to consider not.

... except for when you stuff generated code in your repository instead of as a dependency. (Test::Builder is a strange case. You want your underlying test library to be as stupidly simple as possible and not to rely on anything else so it's as unlikely to fail as possible and as impossible to interfere with what you're testing as ever.)

Now when there's a bug in the dependency in the generated code, everything which uses the dependency has to be updated too. Read carefully. You can't merely update the dependency. You have to know everything on which it depends and wait for the authors to get around to updating their generated code.)

I admit, I'll probably never understand the mindset which says "I'm distributing software for end-users to install in the worst possible way so that they won't have to install software." I understand the use of things like App::FatPacker to make one-file installations possible, but actively distributing generated code in CPAN distributions? Where CPAN has a working dependency resolution model already in place? Where your distribution is already an upstream dependency of thousands of other distributions?

I just don't understand it. I understand that the business of shipping software is the art of managing competing needs, but I can't see how optimizing for fragility helps anyone.

Method-Function Equivalence Strikes Again!

By chromatic on April 18, 2012 9:51 AM | 1 Comment

One of the satisfying aspects of writing an opinionated book like Modern Perl is writing a section like Avoid Method-Function Equivalence. Explaining to a novice programmer a potential pitfall and how to avoid it always seems to me like reducing the amount of potential misery in the world.

That's satisfying.

I've been revising a proof of concept document categorization system into shape for the past year, by adding tests and refactoring and cleaning things up and even adding features. Every week it gets a little bit better, and it's fascinating to discover the patterns of this style of programming. (It's related to debuggability-driven design.) I've enjoyed the experience of watching code get more general and useful and powerful even as that's meant shuffling around code and concepts far beyond the initial design. While there are still messes (what working code doesn't have a mess somewhere?), the code has a goodness to it.

Just when you get a big head, the universe punishes you for your unwarranted hubris. (Annie Dillard once wrote "I no longer believe in divine playfulness." Sometimes "divine antiauthoritarianism" is more like it.)

Monday night, my business partner found a bug. We have a categorization system and several topics into which these documents could find themselves. We added several new categories last month, and I had to revise the sharing system such that documents in one cluster of categories never appeared in other clusters. (Think of it this way: you have a newspaper and want to group articles about food, television, movies, and books in a Life and Culture section and articles about basketball, lacrosse, and hockey into a Sports section, but you never accidentally want an article about food to show up in the Sports section or an article about the felonious tax evasion of Kenny Mauer to show up in the Life and Culture section.)

One line of filtering that's easy to explain to users is keyword filtering. Any article in this topic (food, television, books) must contain one of these keywords: food, television, cuisine, literature, novel, bestseller, author. You get the picture.

Monday's bug was that documents in a single cluster which obviously belonged to a single topic ("Which Television Shows Won't Be Back Next Season", for a fake example) within a cluster showed up as belonging to the cluster as a whole ("Life and Culture") and not the topic within the cluster ("Television").

Fortunately I had most of the necessary scaffolding to build in debugging support. I expected that the keyword filtering was to blame, whether missing the appropriate keywords or not applying them appropriately. (I wondered if the system used a case-sensitive regular expression match or didn't stem noun phrases for comparison appropriately.)

Turns out it was my silly mistake.

All of this filtering for validity and cross-topic intra-cluster association is in the single module MyApp::Filter. This started life as a couple of functions that didn't belong elsewhere. As I moved more and more code around and defined the filtering behavior more concretely, it grew until it made more sense to treat these functions as methods. It's not an object yet. It may never become an object; it manages no state. Yet I changed its invocation mechanism from:

=head2 make_bounded_regex

Turns a list of arguments into an optimized, case-insensitive regex which
matches any of them and requires boundaries at their ends.

=cut

sub make_bounded_regex
{
    return unless @_;

    my @keywords = map { s/\s/./; $_ } @_;
    my $ra       = Regexp::Assemble->new( flags => 'i' );
    my $re       = $ra->add( map { '\b' . $_ . '\b' } @keywords )->re;

    return qr/$re/;
}

... to:

sub make_bounded_regex
{
    my $class = shift;
    return unless @_;
    ...
}

I made all of these functions into methods in one fell refactoring swoop. (Why not? Be consistent! Do more than the bare minimum! Eat your vegetables!) I missed one place which called make_bounded_regex():

sub _build_keyword_filter
{
    my $self     = shift;
    my $kw       = $self->keywords;
    return unless @$keywords;
    return Feedie::Filter::make_bounded_regex( @$keywords );
}

... such that the first keyword (and usually the most important, because that's what users put in first) becomes the $class parameter to the method. Because it's a class method, nothing ever uses $class, so there's no error message about wrong package names.

The tests don't catch this either because of the distribution of test data. (Obviously a mistake to rectify.)

Sure, a language with integrated refactoring support (you don't even need an early binding language with a static type system to get this) could have shown me the error right away. That's one thing I do like about Java. Sure, you need that scaffolding to get anything done, but it does occasionally help you not write bugs.)

What bothers me most of all is that Perl itself has no means by which it could even give an optional warning when you treat a method as a function or vice versa. You don't have even a runtime safety net here.

Warnings will never replace the need for programmer caution, but bugs happen. Bugs always happen. I keep the error log as squeaky clean as possible, and warnings have caught a lot of bugs and potential bugs even during testing, sometimes in our deployed software.

In lieu of warnings though, the best I can do is document my mistakes and explain why they make them in the hope that I won't make them again and you'll be more cautious than I was. (At least this one was easy to fix.)

Debuggability-Driven Design

By chromatic on April 16, 2012 12:04 PM | 2 Comments

One of the pleasures of white-collar work is that you can earn money even while you're not on the job. Unlike a factory worker paid for each widget she assembles, a programmer or publisher or writer or business owner can sell widgets even when sleeping, or eating, or on vacation, or walking the dog.

Thank goodness for the automation of things like the cotton gin, the industrial revolution, the semiconductor, and the information economy.

The downside, of course, is that you have to trust these automations—and they're built by humans and programmed by programmers. (Worse yet, sometimes we are those programmers.) Things go wrong.

Our job is then to find and fix these bugs before the expense of lost potential automation overcomes the value of the automation.

Too bad we're lazy.

I built my own stock screener. It uses public APIs and public data to fetch financial data so that friends and family can make better investment decisions. I don't want them to have to fill out complex spreadsheets, and I don't want to type the right incantations to generate these spreadsheets. That means they get a little box on a web page and type in the symbol of a company and get an analysis later.

... unless they mistype the symbol name, or the company has been acquired, or it's moved between exchanges, or one of a dozen other things has gone wrong. For example, one API happily understands that Berkshire Hathaway has two classes of stock, trading under BRK.A and BRK.B, while another API goes into fits when it encounters a period in the name of a stock, and you have to call BRK.A "BRK-A" instead.

These are the cases where mock objects are no substitute for domain expertise.

The interesting cases are where you wake up on a Monday morning to find that something went wrong with several points of data over a long weekend and you're not sure what.

Sure, you could fire up your debugger, add in one of the offending symbols, and trace the process of API calls and analysis from start to finish, restarting the process as you try hypotheses until you reach an understanding, or at least have to get back to other work. I've done that. You can go a long way with that as your debugging technique.

Lately I begin to suspect that best practice patterns exist for batch processing projects like this. I've already realized that the process of a multi-stage analysis pipeline (first verify that the stock symbol exists; then get basic information like exchange, name, outstanding share count, sector, and industry; then analyze financial information like debt ratios, free cash flow, cash yield, and return on invested capital; then analyze current share price to projected and discounted share price) is, effectively, a state machine, and that by tracking the state of each stock in that state machine, you get both idempotent behavior (you'll never double-process a state, but you can restart a stock at any stage of the process by changing its state) and the ability to identify errors and bail out at any stage of the process (if a stock symbol doesn't trade on an associated exchange, you're not going to get any good information out of it, so avoid the CPU-expensive forward free cash flow projections, because they're useless; if a company's free cash flow trends negative, don't do the financial projections because you don't want to buy a company losing money and the numbers get asymptotically weird as you cross that zero line anyhow).

The real insight is that you should log the relevant information when an item in the pipeline hits an error condition. Sure, you can sometimes run into transient problems like a backhoe cutting fiber between you and your API provider such that your cron job won't get relevant data for a few runs, but you also run into the problem that what you expected to happen just didn't happen. That is, while you've been certain that the fifth element of the JSON array you get back from the request always contains the piece of information you expected, it never contains the information you want for companies in the finance sector, so you can't perform that analysis with that data source.

The real questions of automation are "What do you do when things go wrong?" and "How much information do you need to recover and to prevent these errors from occurring again?"

Maybe that means dumping out the entire API response. Maybe it means emitting the raw SQL query that caused a transaction failure. In my case it certainly means separating the process of fetching data from the process of processing data, so that I can load example data into my system without having to go through the fetching stage over and over again. (This experience of decoupling makes me prefer fixtures to mock objects.)

In my experience, it means that I can run a simple query:

> select count(*), state from stock group by state;
2413|ANALYZED
10|ERROR_DAILY_RATIOS
100|ERROR_UPDATE_BASIC

... to see that I have a few things to debug today, with fifteen dumps of API data in my basic_update_errors/ directory to review when I have a chance to see where my expectations (or the API documentation) has gone wrong.

In a sense, I've automated away half of the debugging process already: I have example data and I know where things go wrong. I know exactly where to look for the errors of assumptions. I don't know what those assumptions are, nor why they're wrong, but a computer probably couldn't tell me that anyway. Even getting halfway through this process means I'm twice as productive—I can focus my energy and concentration on the hard parts, not the tedious ones.

Mock Objects Despoil Your Tests

By chromatic on April 13, 2012 10:35 AM | 6 Comments

I don't know where this idea started in the testing world (I suspect Java, which makes the easy things possible, and the hard things a mess of XML and bytecode generation to combine early static binding with the best type system 1969 and a PDP-7 had to offer), but if you find yourself wiring up a bunch of mock objects to test your system in isolation and pat yourself on the back for writing clever testing code, you'd better be lucky, because you're probably not testing your software well.

First, some philosophy.

Socrates: What's the purpose of testing your code?

Tester: To give us confidence that our software works as designed.

Socrates: How do you test?

Tester: Each piece in severe, bunny-suited, clean-room isolation.

Socrates: Why?

Tester: Because they must work in isolation.

Socrates: Does the system not work as a whole?

Tester: It does.

Socrates: How do you know?

Tester: Because we also test it as a whole.

Socrates: Does this give you no confidence in the correctness of the system?

Tester: No.

Socrates: Why not?

Tester: Because we only have a few tests of the system as a whole.

Socrates: Why? Surely the correctness of behavior and coherence of the system as a whole is important to the system as a project, else why would you be building it?

Tester: But unit tests are the most important tests. Someone somewhere once said so, and I have this really neat framework which generates mocks and stubs if I define the interface in my IDE and wire up the XML output.

Socrates: Does this give you confidence in your system as a whole?

Tester: Well, it's a real pain sometimes keeping the interfaces of the mock objects and their behaviors up to date as we change the code, but that's what a refactoring IDE is for, right? Sure, sometimes we have to add tests of the system as a whole because we find bugs, but you can't test everything to 100% anyhow, can you? Besides, we're using best practices.

Socrates: How do you test database interactions?

Tester: We use mock objects to simulate a database.

Socrates: Why don't you use a real database?

Tester: They're hard to set up and slow and we don't want to spend time debugging things like connection issues in our tests!

Socrates: How do you know your database code works?

Tester: Our mock objects work.

Socrates: When you go out to lunch as a team, do you go to a real restaurant and order food, or do you sit around in a circle pretending to eat sandwiches?

Tester: pantomimes being trapped in a glass box

(Pun mostly not intended.)

Yeah, I wrote a mocking library for Perl, many many years ago. Note well the short description:

Perl extension for emulating troublesome interfaces

I chose the word "troublesome" with care.

I almost never use this module, despite the fact that I wrote it. Sure, Perl and other late-bound languages with serendipitous polymorphism and allomorphic genericity make it easy to swap one thing in for the next if you can treat them as semantic equivalents. Yet in truth, mock objects are far, far overused.

In my experience, mock objects are most useful in very few circumstances:

When you want to test an exceptional condition it's difficult or expensive to produce (system error, external dependency failure, database connection disappearance, backhoe cuts your network cable).
When one tiny piece of an existing piece of code has a side effect you cannot easily control (the actual SMTP-over-a-socket sending of email, the actual purging of all of your backups, the actual adding of butterscotch chips to what would otherwise be a perfectly fine cookie recipe).
When you are utterly unable to control a source of information, such as data pulled from a remote web service (though you can design and test this with a layering strategy).
That's it.

I emphasize for clarity that that list does not contain "I am talking to a database", or "I am rendering an image", or "Here is where the user selects an item from a menu".

It's still important to be able to test your software in layers, such that you have a lot of data-driven tests for your data model and business logic without having to navigate through your UI in your tests, but the fact that you can automatically generate mock objects from your interfaces (or write them by hand, if you're using a language which doesn't require Eclipse to scale programmer effort beyond tic-tac-toe applets) doesn't mean that you should.

For example, one of my projects requires email verification of user registration. I have automated tests for this. One of them is:

    my $url = test_mailer_override {
        my $args = shift;

        $fields{USER_invitation_code} = '12345';
        $ua->gsubmit_form( fields => \%fields );
        $ua->gcontent_lacks( 'Security answers do not match' );
        $ua->gcontent_lacks( 'This username is already taken' );
        $ua->gcontent_contains( 'Verify Your Account',
            '... successful add should redirect to verification page' );

        my ($mailer, %args) = @$args;

        is $args{to}[0], 'x@example.com', '... emailing user';
        is $args{to}[1], 'xyzzy',           '... by name';
        my ($url) = $args{plaintext} =~ m!(http://\S+)!;
        like $url, qr!/users/verify\?!, '... with verification URL';
        $ua->gcontent_contains( 'User xyzzy created' );

        return $url;
    };

The function test_mailer_override() is very simple:

sub test_mailer_override(&)
{
    my $test = shift;

    my @mail_args;

    local *MyProj::Mailer::send;
    *MyProj::Mailer::send = sub { @mail_args = @_ };

    $test->( \@mail_args );
}

... where MyProj::Mailer is a subclass of Mail::Builder::Simple. This code temporarily monkeypatches my mailer class to override the send() method to record its arguments rather than performing an actual SMTP connection to my server. Not only does this run faster than it would if I had to wait for SMTP delivery before continuing the tests, but it avoids the need to set up a local mail server on every machine where I might run the tests or to hardcode mailer credentials in the test suite or even to need an active network connection to run the tests.

This makes the tests run quickly and gives me great confidence in the code. Furthermore, I know that on the real machines where I have this code deployed and running, the mail server and configuration works because I get real mail from it.

(I could as easily make my own subclass of my subclass which overrides send() this way and pass in an instance of that subclass through dependency injection. That would work well if I had to mock more than one method, but I haven't needed that yet and this localized monkeypatching was even easier to write and to maintain, so it serves me well enough for now.)

As for tests of database activity, I use DBICx::TestDatabase to create an in-memory test database for each test file in my suite. I have a batch of representative data carefully curated from real data to cover all of the important characteristics of my business model. I don't have to worry about any potential mismatch between data objects and mock objects and the real database because everything about my database is real except that it never actually exists on disk.

(If I had tests that care that it exists on disk—and I can only imagine a few reasons why I might—I would have stricter tests run on machines intended for production. If that's a concern, I'll do it. It's not a concern.)

I do understand the desire for mock objects and loose coupling and beautiful architectures of free floating components interacting briefly like snowflakes gently tumbling to a soft blanket on a Norman Rockwell Christmas landscape, but my code has to work correctly in the real world.

That means I have to have confidence that my systems hang together as working, coherent wholes.

That means that the details of what my tests do—their actions and their data—have a strong coupling to the behavior of my systems, because I have postulates and expectations to verify about those systems.

That means that I don't have time to duplicate behavior between real code and mock code, because I really only care if the real code works properly, and anything which distracts me from that, especially while debugging a failing test, is in the way.

That means that I will use mock objects sparingly, when I can't achieve my desired results in any cheaper, faster, or more effective way... but I won't mistake them for the real thing, because they're not.

use_ok() is Broken Because require() is Broken

By chromatic on April 11, 2012 10:30 AM | 5 Comments

Ovid's post on avoiding Test::More's use_ok() is good advice. There's almost no reason to use use_ok() from Test::More in existing code. It probably doesn't do what you think it does, and it doesn't really help against most of the failures you probably care about.

Worse, it can give you a false sense of security and mislead you into debugging the wrong thing.

Why? Because it turns what ought to be a fatal, program-killing exceptional condition ("Hey, I couldn't load this module! I'd better stop now. Things are certainly not going to work the way anyone expects!") into a simple failed test ("Oopsie! Better check your assumptions! I'll keep going though, because hopefully you just made a typo in your test assertion!").

The problem really isn't with use_ok() though. The problem's with Perl 5's require.

require does one thing. It searches the filesystem for the named file, compiles it, and caches the success or failure of compilation. (Make that three things.)

Here's the problem: what if compilation fails? What of compilation fails halfway through the file? What if compilation fails on the very last line of the file because the module doesn't return a true value? Try it yourself:

package FalseReturnValue;

sub demo { 'demo' }
sub demo2 { 2 }

0;

... and the test:

use Test::More;

use lib 'lib';
use_ok( 'FalseReturnValue' );

is FalseReturnValue::demo(), 'demo', 'Declared functions exist';
is FalseReturnValue::demo2(), 2,     '... all of them';

done_testing;

The problem is that failing to load a module should never leave your system in an inconsistent state.

Getting this right is very, very difficult. Getting this right means not committing anything to globally visible symbols until you're certain that the module compiled correctly. For a module like FalseReturnValue, that's easy. For a module which itself uses something like Catalyst or DBIx::Class with several dependencies, this is tricky.

The best approach I can think of is to maintain some sort of transactional system (yes, I know it sounds awfully complex, but you asked for correctness first, so humor me through at least this sentence) where you build up a set of changes to globally visible symbols and then only apply that delta if that compilation as a whole—the top-level module and all of its dependencies—succeeds.

The second best solution is to do that for each module. It's all or nothing for each use statement on its own, regardless of how far down the dependency tree you are.

(You could go one step further and make everything anonymous by default, such that the only way you can access package global symbols in another namespace is by binding to that namespace explicitly, but that's a bigger change with implications on code reuse and the cross cutting concerns of an object system, even though it does have the potential to clean up things like accidental exports.)

Of course, if your worldview has already said that failing to load a dependency with use should abort the program with red flashing klaxon lights and a siren, you don't have to do that much work.

(... but require errors are exceptions you can catch with eval { ... }, so the problem remains with require.)

Defending Against Its Dynamic Scope

By chromatic on April 9, 2012 11:37 AM | 11 Comments

A lot of the advice given in Modern Perl: the Book is advice learned the hard way, whether through making my own mistakes, debugging code written on my teams, or reviewing code written by other people to help novices become better programmers. After a decade-plus of this experience, I think I've developed a good sense of what people find confusing and what problems rarely occur in practice.

The pervasive use of global variables? It'll eventually catch up to you. Variables popping into existence upon use, not declaration? It'll cause problems far sooner than you ever expect.

Clobbering $_ at a distance inside a map block? It happened to me the other day. Yes, it surprised me too.

I've been attaching the Perl search bindings Lucy to a document processing engine. As part of the processing stage, my code adds documents to the search index. The index schema keeps track of specific fields, and it's denormalized slightly to decouple the document database from the search index. The code to add a document to the index creates a hash from method calls on each document object. That's where things started to go wrong:

sub add_entries
{
    my ($self, $entries) = @_;
    my $index            = $self->index;

    for my $entry (@$entries)
    {
        my $fields =
        {
            map { $_ => scalar $entry->$_() } keys %index_fields
        };

        $index->add_doc( $fields );
    }

    $index->commit;
    $self->clear_index;
}

I noticed things went wrong when my test bailed out with strange errors. Lucy was complaining about getting a hash key of '', the empty string. I was certain that %index_fields was correct.

While most of the methods called are simply accessors for simple object properties, these document objects have a method called short_content():

sub short_content
{
    my $self    = shift;
    my $meth    = length $self->content > $MIN_LENGTH ? 'content' : 'summary';
    my $content = $self->$meth();

    return unless defined $content;

    my $splitter     = Lingua::Sentence->new( 'en' );
    my $total_length = 0;
    my @sentences;

    for my $sentence ($splitter->split_array( $content ))
    {
        push @sentences, $sentence;
        $total_length += length $sentence;

        # must be a big mess, if this is true
        return $self->summary
            if  @sentences    == 1
            and $total_length > $MAX_SANE_LENGTH;
        last if $total_length > 0.65 * $MAX_LENGTH;
    }

    if (@sentences)
    {
        my $text = join ' ', @sentences;
        return $text if length $text > $MAX_SENTENCE_LENGTH && $text =~ /\S/;
    }

    return substr $content, 0, $MAX_SHORT_CONTENT_LENGTH;
}

A document may have a summary. It has content. short_content() returns a slice of the first significant portion of either, depending on which exists. While it's not the most detailed portion of the document, it's the earliest significant portion of the document, and it's demonstrably the best portion to index as a summary. (Thank you, inverted pyramid.)

The rest of this method attempts to break the short content at a sentence boundary, so as not to cut it off in the middle of a word or thought.

Nothing in that method obviously clobbers $_, but something called from it apparently does. (I wonder if Lingua::Sentence or one of its dependencies reads from a file or performs a substitution.) Regardless, I protected my precious hash key with a little defensive programming:

        my $fields =
        {
            map { my $field = $_; $field => scalar $entry->$field() }
                keys %index_fields
        };

... and all was well.

While this has been a very rare occurrence in 13+ years of Perl 5 programming, the trap is particularly insidious. The more work Perl does within that map block, the greater the potential for action at a distance. Furthermore, the better my abstractions—the more behavior hidden behind that simple method call—the greater my exposure to this type of bug.

Throw in things like dependency injection and Moose lazy attributes where you don't have to manage the dependency graph of your data initialization and flow yourself (generally a good thing) and you can't tell what's going to happen where or when.

If my instincts are correct, and something reads from a file somewhere such that only the first call to construct a Lingua::Sentence object clobbers $_, the point is doubly true.

(Sure, changing map to autolexicalize $_ would fix this problem, but it has backwards compatibility concerns for people who rely on this action at a distance—remember, it's only a problem if you write to $_—and it's too late to get such a change into Perl 5.16, even if it were only enabled with use 5.016;. Meanwhile, a variable assignment fixes it for me right now, and that will have to suffice.)

Perl and that Dirty Word

By chromatic on April 6, 2012 11:49 AM | 17 Comments

To convince people to do something, you must first let them convince themselves that it is in their interest to do so.

Consider the first paragraph on Python.org:

Python is a programming language that lets you work more quickly and integrate your systems more effectively. You can learn to use Python and see almost immediate gains in productivity and lower maintenance costs.

If your concern is "getting stuff done" and "not painting yourself into a corner", all "in time and under budget", that's compelling.

Compare that to the first paragraph on Perl.org:

Perl 5 is a highly capable, feature-rich programming language with over 23 years of development.

That's an improvement over the old perl.org, but the Python text is more compelling.

(I'm open to the idea of adding good summary marketing text to the Perl.com homepage, but the entire site needs a theme overhaul. Anyone with Movable Type/Melody theming experience and some free time is more than welcome.)

When I wear my business hat, I usually do the copywriting for my businesses. This means the back cover copy and press release text and website blurbs for Onyx Neon books. This means the sales and marketing pages for Club Compy (site revision pending my business partner's new fatherhood of twins). This means even slogans and taglines such as find the right price for stocks.

I know techies have a visceral reaction to the idea of "marketing" whereby we pretend to be shocked, stunned, and even offended that money might change hands. Filthy lucre! The immolation of Croesus! How dare you suggest that people talk about what's important to them, you lousy spammer?

Perception matters.

Perl has some huge advantages over other languages. Not every advantage is entirely exclusive, but if you were to start a new project in Perl today because one or more of these advantages were important to you, reasonable people would understand:

Perl is ubiquitous. It runs everywhere.
Perl is stable. Well-written programs will continue to run with little intervention.
Perl has a huge ecosystem in the CPAN. Most of your work is already done for you—freely usable, modifiable, and well tested.
Perl scales with your problems. It's suitable for small, quick programs as well as powerful, big-business programs.
Perl is flexible. It lets you do what you need to do and stays out of your way.
Perl is reliable. It has a huge test suite, a regular release cycle, and copious documentation.
Perl is easy to learn. (I wrote and give away and have committed to producing yearly revisions of Modern Perl: The Book to make this bullet point.)
Perl helps you create great software, with plenty of wonderful tools and libraries ready to help. (This is where Task::Kensho, Moose, Perl::Critic, Test::More, et cetera come in. Note carefully the place within the list.)

It's easy to extend this list to get more specific when the situation demands (though we tend to get too specific too fast; Catalyst isn't necessarily interesting to someone who really wants Bioperl, while DBIx::Class has little obvious value to a harried system administrator). It's nice to have a good place to start, though. Python has it right in two sentences. Perl needs a short blurb at least that good.

(Remember to keep in mind that the goal is to help people convince themselves that Perl will solve their problems—or help them realize that it's not what they need, while still reinforcing the image of Perl we want to express.)

-Ofun for Whom?

By chromatic on April 4, 2012 6:00 AM | 2 Comments

A fundamental, rarely-questioned piece of wisdom about free software is that it works best when it scratches the itches of its developers.

(You can tell when someone's passionate about something. You can also tell when that someone has no love for the work; the results often differ. Then again, passionate people made the movies Plan Nine from Outer Space, Avatar, and Manos: The Hands of Fate.)

What could motivate someone to work on something outside of work hours? What could motivate someone to spend time solving a hard problem for the challenge of it? For status? For low pay? For hope of a future benefit?

A few years ago, Audrey Tang described Optimizing for Fun, a project organization strategy for cultivating new contributors by lowering the barriers to contribution and relentlessly encouraging even the smallest progress as valuable, desirable, and sustainable.

More projects could benefit from that, in and out of software, in and out of business, in and out of the world of volunteers, professionals, dilettantes, and amateurs.

The 20th century author and theologian C. S. Lewis suggested that every vice is a virtue misapplied. (To belabor the point, gluttony—one of the seven deadly sins as borrowed by Gregory the first pope from a fourth century monk—is the misapplication of the legitimate enjoyment of food. Lewis was no stoic.)

What's -Ofun misapplied? Does this sound familiar?

We must remember that, from what we can see, the $foo project's primary mission in life is providing entertainment for $foo developers, not convenience or stability for $foo users. Which is understandable given that volunteers, who make up the vast majority of $foo developers, tend to do whatever it is they do for fun, not for drudgery.

That could apply to many projects. (Relevant context: a comment by user anselm on LWN.net's story "Free is too expensive".

Fixing bugs isn't always fun. Keeping an old and crufty API around until users have time to migrate off of it isn't always fun. Making and meeting promises about release dates isn't always fun. Writing documentation isn't always fun. Holding back new features in favor of improving existing ones isn't always fun.

Sometimes supporting real actual users—not just hobbyist developers who already think downloading and compiling the new version out of your repository is fun—sometimes takes work and effort.

As a developer, each of us gets to decide the degree to which we pursue things we find enjoyable. If it stops being enjoyable, you have every right (even the responsibility) to change your situation to make it more fun or to leave it for someone else to do. Your obligation is to do the best work you decide you are obligated to do. Nothing more, nothing less.

... but if your desire to do the fun bits exceeds your willingness to put in the hard work to understand what your users want and need and expect, at least do them the courtesy of not acting surprised when they tell you of their disappointment. You might Osborne your project, like Rakudo did.

What Testing DSLs Get Wrong

By chromatic on April 2, 2012 11:27 AM | 5 Comments

A conversation between the owners of ClubCompy about language design, syntax errors, and testing led to an interesting exchange (lightly edited for coherence):

How do you go about testing order of operations and languages?

You need a minimal test driver that takes an expression or series of expressions and verifies that it either parses or produces a syntax error. The test matches part or all of that error.

Any given input either parses correctly or produces an error.

Our current test framework cannot "see" when there is a syntax error. We set a flag right before the end of our test programs and test that that flag has the right value.

The most robust strategy I've seen is to add a parse-only stage to the compiler such that you feed it code and catch an exception or series of errors or get back a note that everything's okay.

You can inspect a tree structure of some kind to verify that it has all of the leaves and branches you expect, but that's fragile and couples the internals of the parser and compiler and optimizer to the internals of your tests.

Is having a huge battery of little code snippets that run or fail with errors the goal?

Ideally there's as little distance between "Here's some language code" and "Here's my expected results" as possible. The less test scaffolding the better.

I've never been a fan of Behavior Driven Development. I think Ruby's Cucumber is a tower of silly faddishness in software development. (Any time your example walks you through by writing regular expressions to parse a subset of English to test the addition of two numbers, close the browser window and ask yourself if slinging coffee for a living is really such a bad idea after all.)

I neither want to maintain nor debug a big wad of cutesy code that exists to force my test suite into "reading like English"—as if the important feature of my test assertions were that they looked like index cards transcribed into code.

Nor do I want to spend my time tweaking a lot of hairy procedural scaffolding to juggle launching a compiler and poking around in its guts for a magic flag so that, a couple of dozen lines of code later, I can finally say yes or no that the 30 characters of line noise I sent to the compiler produced the error message I expected.

I want to write simple test code with minimal scaffolding to highlight the two important attributes of every test assertion:

Here's what I did
Here's what I expected to happen

That means I want to write something like:

parses_ok 'TOCODE i + 65',
    'precedence of + should be lower than that of TOCODE';

Instead of:

Feature: Precedence of keywords and arithmetic operators
  In order to avoid parse errors between keywords and arithmetic operators
  As an expert on parse errors
  I want to demonstrate that keywords bind more tightly to their operands than do operators

  Scenario: TOCODE versus +
    Given code of "TOCODE i + 65"
    When I parse it
    Then the result should parse correctly without error

Which would you rather read, run, and debug?

All of these "DSLs for $foo" jump too far over the line and try to produce the end goal their users need to make for themselves. I don't want a project that attempts to allow me to write my tests in a pidgin form of English (and I get to parse that mess myself, oh joy, because I'm already testing a parser and the best way to test a parser is to write a custom fragile parser for natural language, because debugging that is clearly contributing to real business value).

Ideally, I want to use a library someone else has written that can launch my little compiler and check its results. I want to use this library in my own test suite and have it integrate with everything else in the test suite flawlessly. It should express no opinion about how I manage and arrange and design the entire test suite. It should neither own the world, nor interfere with other tests.

In short, if it has an opinion, it limits that opinion to just a couple of test assertions I can choose to use or not.

In other words, I still want Test::Builder because T::B lets me decide the abstractions I want or don't want and reuse them as I see fit. After all, good software development means building up the vocabulary and metaphors and abstractions appropriate to the problem you're solving, not adopting a hastily-generalized and overextended pidgin and trying to force your code into the shapes demanded.

If I'm going to have to write code to manage my tests anyway, I'll make the input and expected output prominent—not a boilerplate pattern of repetition I have to parse away anyhow.

« March 2012 | Main Index | Archives | May 2012 »