April 2011 Archives

Abstraction versus Mock Objects

One of my projects switched persistence mechanisms from KiokuDB to DBIx::Class recently. KiokuDB had the benefit of simplicity—it was very easy to develop the initial project without worrying about managing table schemas and upgrades, but as the project matured, the benefits of DBIC became more apparent.

I like how KiokuDB persists plain old Moose objects. If you understand how your object graph works, there's little effective difference between a Moose object you create yourself or one you retrieve from your object store. Less so DBIC objects, in my experience. This isn't a flaw. It's merely a difference.

The difference became apparent when I started to port a fundamental test file as part of this migration. This test file exercises the document parser. As part of the application, the document parser uses a variant of the Readability algorithm to extract the meaningful portions of HTML documents. The test file itself uses a data-driven approach, where a t/files/documents/ directory holds multiple YAML files containing real data we've encountered along with the expected (correct) results.

#! perl

use Modern::Perl;
use Test::More;
use YAML::XS;

use lib 't/lib';

__PACKAGE__->main( 'App::DocParser', @ARGV );

sub create_docparser
    return App::DocParser->new(
        url       => 'http://www.example.com/some_url',
        url_base  => 'http://www.example.com/',

sub main
    my ($test, $module, @files) = @_;
    use_ok( $module ) or exit;

    if (@files)
        $test->run_file_content_tests( $module, @files );
        $test->test_find_content_files( $module );


sub test_find_content_files
    run_file_content_tests( @_, glob 't/files/documents/*.yaml' );

sub run_file_content_tests
    my ($test, $module, @files) = @_;

    for my $file (@files)
        my $example   = YAML::XS::LoadFile( $file );
        my $docparser = create_docparser( %$example );

        like $docparser->content, $example->{content_regex},
        "DocParser should find matching content for $example->{desc}";

        is 0 + @{ $docparser->links }, $example->{link_count},
            '... with the right number of links';

The previous version of this code created App::Document objects—instances of the same class persisted into the object graph with KiokuDB—and called functions in the DocParser namespace to find various pieces of context. A straightforward port of this test file to DBIC meant I'd have had to figure out how to instantiate dummy DBIC-alike objects representing documents to pass to the DocParser functions.

I thought about that for a few minutes. I pondered reaching for Test::MockObject::Extends. Then I didn't.

What's important in this test isn't that App::DocParser conforms to a specific interface governed by the persistence layer. That's irrelevant. Its only relevance is how well it identifies and extracts relevant information from given data.

I turned that namespace full of utility functions into a class. I gave that class the specific attributes on which it needs to operate, effectively giving instances of the class the responsibility to manage the data on which they operate. You can see the result: a simple constructor call creates a DocParser object without worrying about setting up test data in an example database or building a mock object framework which behaves sufficiently like a DBIC row object.

I really like the object responsibility pattern which Moose seems to encourage, where you create an object by passing all relevant data to its constructor and then operate on that data as needed (perhaps governed by laziness in attribute accessors). You get the benefit of being able to assume that a constructed object is in a safe and sane state as well as a decoupling of data dependencies.

Once I realized that this approach would lead to better code, it took twenty minutes to Moosify the document parser and one minute to change a couple of lines of code to port the test to the new framework. I count that as a sign of effective design.

Civility Starts with Me


Andy is tired of putting up with bad behavior in the Perl world and so am I.

This needs to stop.

There is too much unconstructive, undeservered, passive-aggressive, drive-by abuse.

(My goodness, even PerlMonks has meaningless arguments over the precise technical meaning of simple concepts like "constant" and "initialization" that have recently devolved into accustations of deliberate misleading and lying.)

This also needs to stop.

I don't care if other communities are better or worse or the same. I don't care if you've been coding for a thousand years or ten seconds. I don't care if you wrote the book on Perl or have never even read a book on PERL.

There is no excuse for abusive behavior.

Civility starts with me. I've written and said things I knew were wrong and the time and said and wrote them anyway and I regret them. I've said and written things that turned out the wrong way and I regret them too. (I've had other people—some well-known leaders and some not—call me out for doing so, and I've asked them to keep me accountable for what I say.)

That's part of fixing the community.

The other part—at least as important—is building up an intolerance for abuse. In the several hours since I first saw the first linked comment, no one has called it inappropriate and rude and abusive. No one on PerlMonks stepped in to calm either of the two recent threads that have gone out of hand. (Has anyone at PerlMonks even brought up the idea of a temporary banning of people who post abusive comments? In my mind, it's a fair trade to rid the site of vitriol from even a so-called saint for the good of everyone else.)

For goodness sake, the Beginner's list has even had a long discussion over whether it's okay to be abusive to novices who might not even known which frackin' manual to read because no one has ever told them that the manual exists, because (and I quote) "They need to grow a thicker skin if they are ever to succeed in programming."

(I'm subject to regular abuse from a few quarters for writing a book for novices and giving it away for free, as if I had some sinister agenda to get rich by forcing people to format their code the same way as I do. The horror. The horror.)

These are not insurmountable problems. I've seen first-hand IRC channels (Perl IRC channels!) where someone has said "That joke's a little insensitive and a lot off topic. We appreciate not having that type of discussion here." or "This channel isn't the best place for help on that topic. If you join #another-channel, I'm happy to help you there."

I don't care who you are or what you've done if you can't be civil. I don't care if you've excused yourself from politeness thanks to your self-diagnosis of a social disorder with the Camel in one hand an the DSM in another. You're not welcome in projects I lead if you cannot or will not treat others with respect. I will abandon projects which do not value mutual respect and civility. I will speak up when I see incivil behavior which needs to stop. I will watch what I say and write and will apologize and reconsider my actions when people tell me I cross a line. We're building software, after all, as part of a community, with the belief that working together helps us build better software more easily.

In short, the Perl community needs far more people like Karen Pauley, Ask Bjørn Hansen, Jess Robinson, Tim Bunce, and, yes, Larry and Gloria Wall. If you agree, please, please speak up.

In Praise of Not Writing Code


If you still believe that avoiding dependencies at all costs is a viable strategy, consider how much code you would have to write to clone del.icio.us in 24 hours. Can you get that all correct writing all of that code by hand?

Me neither—not because I can't (eventually) get it all right, but because I have much, much better things to do than to write yet another dispatcher and worry about all of the fiddly bits of encodings and character sets and tainting and verification that everyone else everywhere needs and uses.

By all means let us explore possibilities of removing the walls against which people installing dependencies bang their heads against, but let us also never fail to wave the banner of "We write glue code, darnit!" proudly. Of course we glue together libraries. It saves us the time and energy and debugging and frustration we should otherwise be spending on inventing new things, or at least better things.

In a world without Moose and Catalyst and Dancer and Plack and even and especially TAP, I don't use Perl, I don't write books and articles, I don't publish CPAN modules, I don't contribute to a common ecosystem from which other people can draw.

Don't like the way my freely available and freely usable and freely modifiable and freely redistributable code works? You get to write your own glue. Me? I'll be off solving the next problem.

If there's a debate between embracing the CPAN and relentlessly avoiding dependencies, my taste runs toward the former. Certainly avoiding unnecessary dependencies is a valid strategy, but no other reason for avoiding useful dependencies argues, in general, more strongly than the reasons for using dependencies.

Consider some of the most successful CPAN distributions: DBI, DateTime, and Test::Builder. On their own, they provide many more features than most projects ever need (I care mostly about PostgreSQL access, continental US time zones and holiday schedules, and often only a handful of Test::* modules). I could probably reimplement most of the code I need myself with fewer abstractions unnecessary for my needs, less code overall, and increased ease of installation for the users of my software, but I don't. Probably neither do most of you.

I see Moose the same way. So do some of you. While I could write an object system myself (I've contributed to a few on projects you've heard of), it's not worth my time to do so when a perfectly good system already exists—a better system than I'm likely to write, given the time I have available.

More than that, installation is once, but sharing is caring forever.

In other words, the hassle of figuring out how to bundle or mark for installation one or more dependencies is a single cost. The maintenance burden of my reinvented libraries lasts as long as I allow those reinventions to exist. If my users are capable of using a custom CPAN (or CPAN itself) to install my software as bundled for the CPAN, dependency management has almost solved itself. The whole of the CPAN ecosystem, including testers and platform smokes and bug tracking, supports those dependencies.

I can release my own reinventions to the CPAN to take some advantage of these network effects, but compare the contributor lists to the three distributions I mentioned to the contributor lists of single-use distributions. The correlation exists: the more widely used a dependency, the better chances of it meeting user needs. The more users (and the more diverse users), the more chances to find bugs and infelicities. (You can make the case that CPAN Testers could use rigorious stochastic analysis to improve the confidence and utility of its coverage, but the world of free software tends to work best when it trades small amounts of volunteer labor, properly applied, for trivial problems.)

This is not an absolute guarantee, of course. For example, security flaws may exist in a widely used dependency where a careful reimplementation may avoid the problem&mdash. Best judgment applies, but, ceteris paribus, I trust a project more upon learning that it does one or two things well instead of attempting to do everything itself.

What do you want to see on Perl.com?

Since the relaunch of Perl.com under the auspices of TPF late last year, I've been (slowly) gathering new material to publish. I have a few outstanding articles coming in, but I'd like to find more—and I'd very much like to publish things that people really want to read. My queue right now has:

  • JT Smith on writing Facebook applications with Perl
  • Rewriting rules with Plack middleware
  • HTML 5 and Perl web applications
  • Test::Builder 2
  • Installing and using CPAN modules with ActivePerl

(I've elided the names of the authors who've promised to write but have not yet delivered because they know who they are.)

If you have other ideas (whether as a reader or an author), please contact me.

If your business is flush with cash to pay another company to take on what they claim are legal risks that could shut down your business, ActiveState would like to sell you protection from the risks of using Perl, Python, and Tcl.

As an alternative, consider a cheaper approach: respect other people and their work instead of exploiting them.

(One wonders at the cognitive dissonance necessary to build a business around redistributing freely-redistributable software and the desire to market said business by suggesting that it's risky merely to use said software.)

(One also wonders at the bullet point "assurance of no GPL".)

The Verifiability of Syntax

Imagine you're building a web application. Imagine that the application exposes several URLs. Imagine that the application's workflow requires navigation through those URLs.

You have to develop and deploy a model of those URLs somehow.

The well-worn CGI approach is to write separate semi-interdependent programs for each step of the application. Each new URL element is a separate program. (This model matches the statelessness of HTTP nicely, but requires careful thought on the server side to manage the necessary statefulness securely.

Much of the web has settled on an MVC approach where all requests come in through one or more controllers which dispatch to functions or methods based on request contents: the URI, any form parameters, and the query string. Often a framework handles this dispatch by unpacking all of this information and matching regular expressions or fraguments of the URI, then traversing all the way to an endpoint.

I like Catalyst's approach to dispatch (see especially Catalyst's chained dispatch, a feature I'm pleased to see Dancer adopting as well): dispatch points are methods on a controller object with metadata attached in the form of subroutine attributes. This metadata registers methods as dispatch points and identifies the pieces of the URI they consume and any necessary preconditions.

It's flexible and powerful. Once you understand the terms used in the Catalyst world, you have many options to do the right thing for your application. You also get some form of validation because subroutine attributes are merely Perl 5 syntax, not entirely raw string text (though admittedly, the parameters to these attributes are raw strings which require validation—fortunately, as Perl 5 processes attributes during compilation, the attribute handlers can sanitize and validate these parameters to some degree during compilation).

Another approach, used in many places including the Java framework which shall remain nameless, is to use an external file to manage routing information. In this case, it's an XML file which maps URI components to controller classes and methods on those controllers. It also maps the results of those methods to template bundles (declared in another XML file).

Thanks to the Java web world's tragicomic affair with Beans and the prevalence of CamelCase, I constantly get capitalization wrong in this file. You see, while I may have a controller class named AdminAction, the right way to instantiate classes in Java is to let Spring manage their lookup and instantiation and lifecycle.

In other words, even though I know that this application needs a class named AdminAction and that that class will always be a controller and that the routing information declared in this XML file relies on AdminAction behaving as coded, the right way to write this application means declaring a bean named adminAction (notice the capitalization) in yet another XML file and using that in the routing XML file instead of the class name.

(Apparently this makes replacing the real controller with a mock controller easier for testing, as if testing a mock controller were worth my time—what could it tell me about the validity of my Spring configuration or my routing configuration that basic end-to-end testing couldn't tell me better? And people who like to write tests wonder why some people complain that a focus on unit testing is, in many cases, useless busy work.)

The funny thing about the bean syntax is that it means, at its core, that if you have an object with instance data such as companyName or userAddress, those fields are private which means that you have to have public getters and setters which match the form getCompanyName and setUserAddress, but when you refer to those bean properties, you leave off the get and set and follow the capitalization of the field names (companyName)—the private names, even though you're using the public accessors with different capitalization.

Compound this with the fact that the right way is to use Spring, which instantiates these objects lazily and introspects their properties and... well, you get to find your capitalization typos at runtime.

Whereas in a language or framework or toolkit which allowed you to define syntax for all of these things would let you validate even capitalization conventions (hey, if such a parser existed, it could have opinions about things like this!) and let you know far, far before some hapless tester clicking his way through a 38 page test script hit a runtime error that you made a silly typo...

... because hoping that unstructured data is correct without ensuring that it is correct is a recipe for errors.

Beans versus Immutable Objects

Moose revealed itself as Good Code to me when I realized that it encourages Good Design. If you follow the example code and use the API it exposes in a natural way, you tend to write well-organized and reasonably safe code (where "safe" means "you paint yourself into fewer corners").

That's the sign of a very good library or framework: the common things are easy and they resulting code is easy to understand, maintain, and extend.

Moose has a very nice feature of reducing the surface area of the APIs you create with it. You can declare a class with several attributes but only expose a few public getters or setters. The ability to use lazy initialization of attributes—as well as default values—allows the calculation of dependent attributes as needed.

Best yet, if you can arrange your classes in such a way as to provide their necessary attribute values when you call their constructors, you can treat your objects from the outside as immutable, and you're halfway toward inversion of control, if that's useful to you.

(Read more about Moose and immutability and the value of these features and designs in the Modern Perl book, available in several freely redistributable electronic forms.)

Think of all of this as not coloring outside the lines. While good art is aware of the structure and limitations of its chosen form and breaks those conventions when it makes for better art, good software is primarily a conversation mechanism where perhaps you don't want to try to outdo Pynchon or DeLillo or Wallace. Perhaps.

Contrast this mechanism of working with objects with the bean style as expressed in parts of the Java world: a class has several private attributes. Each of those attributes has public getters and setters which follow a standard naming convention. Objects interact with other objects by their conformance to a protocol and discovery and use of those getters and setters enabled by the magic genericity of reflection.

(Did anyone else hear a duck quack?)

For the convenience of allowing a dependency injection such as Spring or a web framework inspired by J2EE to manage some of the boilerplate of managing object instantiation or object lifespan or even mapping object data to and from strings for web or web request display, you not only give up the creation of your objects, but you expose their internal attributes behind a thin (and, be honest, autogenerated by your IDE when it pops up a warning!) layer of accessor method.

There is value in the loose and ad hoc adherence to a protocol of practice...

... but for a language which claims great maintainability benefits from forcing the use of static, manifest typing, you may encounter a fair few perplexing errors if you don't get the CamelCase type of your bean property just so. (Remember, reflection!)

While sometimes I do wish Perl 5 were stricter about types and signatures—certainly I look forward to using Perl 6's gradual typing for more projects—I have more sense of safety with the Perl approach, where we don't assume our language is completely safe by itself. We're a little more paranoid, and—at least in this case—I believe it leads to better designs.

I've spent the past month developing an Enterprise Class web site (which means "a CRUD application written mostly in XML configuration files with a smattering of Java here and there") to help out a colleague who shall remain nameless.

The good news is that the Perl world (at least for those enlightened souls fortunate enough to be able to use modern Perl techniques) has little to worry about from the Enterprise Class world.

The bad news is that some of the problems afflicting Perl—especially for people who don't write it all day, every day—are common to software development in general.

For example: this CRUD project uses an Apache framework which has been around for most of a decade and is on its second iteration. Apache projects get a lot of credit for rigorous documentation...

... except that their documentation in this case is, as with most open source projects, a mish-mash of conceptual overviews, API references, contributed guides, tutorials, and links to mailing list archives and blog postings, with little narrative flow between them and, worse, often precious little internal information on which version of the framework applies.

Search for information with your favorite search engine and be ready to wade through inapplicable and inappropriate data, whether wrong from the start or rendered irrelevant several years later. (This is as much an indictment of search engines as anything else.)

Perhaps the most concerning trend in this project is that the Enterprise Class nature of the project betrays a desire not to express any design opinion. In fact, the dominant design opinion seems to be "Anything is up for grabs"—you can perform validation in several different ways, you have your choice of several competing templating engines, you can map URIs to controller actions in at least three ways, and there are at least two (possibly more) mechanisms of accessing model data in templates.

The cost of all of this complexity exposes itself in two places. First, you have to decide (through coding standards or stumbling into a design you like) how to structure your application before you can get anything done. Second, you swim through a Styxian deluge of XML to configure (and reconfigure and modify and adapt and configure and reconfigure...) anything and everything.

(A plugin exists which nominally supports convention over configuration, but the lack of documentation and, apparently, interest renders it far too risky.)

I hesitate to blame too much the authors of the framework for this mess (and, as such, refuse to name the framework). Certainly some of their design choices reflect fundamental flaws of the Java language itself (in particular, Java's relative inflexibility especially with regard to its silly static manifest type system combined with the IDE-based tooling necessary to manage its verbosity and lack of abstraction make even programming in XML seem more flexible). Even so, the ceremony necessary to produce even the single CRUD steps of "Create an item and view it" involves editing multiple XML files, creating several Java classes, and—if you do things the Enterprisey way—creating and configuring subtemplates for the entire layout of the site as well as embracing the inversion of control object instantiation and initialization pattern (and given the unpopularity of the immutable object pattern in Java as compared to bean-style mutators, it's not easy to reason about the state of your objects) is damning.

The lessons for Perl are simple:

  • Avoid the overcomplexity of projects by emphasizing the flexibility and simplicity of the language. Yes, you read that right. Meditate on it.
  • Write better documentation.
  • When the best way to do something changes, deprecate old documentation and point to the new approach.
  • Emphasize simplicity. Make the default a good way to do most things. Provide comprehensive tutorials which explain the entire system using that effective default.
  • Provide smaller upgrade paths and avoid big-bang incompatible releases which fork documentation and knowledge.
  • Unify where possible.

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide



About this Archive

This page is an archive of entries from April 2011 listed from newest to oldest.

March 2011 is the previous archive.

May 2011 is the next archive.

Find recent content on the main index or look in the archives to find all content.

Powered by the Perl programming language

what is programming?