Time Will Tell

| No Comments

The May 2012 Dr. Dobb's interview with Ward Cunningham has an interesting quote about Ward's notion of technical debt:

I was really devoted to finding great code, especially when objects were new. Objects gave us an extra dimension beyond functional decomposition. And the question was, "Are these the right objects or not?" And the answer was, "Time will tell."

I work off and on with a handful of great programmers in the Portland area. Several years ago, James Shore and Dave Woldrich created CardMeeting, an agile remote collaboration tool. Jim and Dave are both very good programmers. For this project, they decided to forgo their usual test-driven development and just write code so as to deliver a working prototype on a vry strict deadline.

Jim took to calling that experience "leveraged technical debt". My estimate (not having read the code, but having tested a lot of code written without testing in mind) is that it takes at least as long to write tests for untested code as it took to write the code and much longer the more time has passed between writing the code and writing the tests.

Jim, Dave, and I have all worked on small, software-driven businesses doing things we've never seen anyone else do before. We've all had to deal with the risk of building lots of code that may or may not solve the problems of real customers with real money. When I say write the wrong code first, I don't mean "deliberately do things you know won't work" or "paint yourself into a corner" or even "use the fact you don't know everything you're doing as an excuse to play with completely new technologies you don't know how to use". (Not that the latter is a bad thing, but if you decide to do that, do so only after you've considered the risks and the rewards.)

Last night, we had a short conversation with John Wilger, another PDXer. He works with a successful and relatively young startup with a huge software component. I don't want to put words in his mouth, but it sounds like their software is, colloquially, a mess. Their developer team is trying to get to the point of slapping hands whenever someone needs to make a change and starts by copying and pasting code.

Four years after founding (and two years after discovering its cash cow business), the company was worth at least $3 billion.

It's irresponsible to derive meaningful statistics from a single data point, but we can say this: the technical debt of their codebase didn't entirely prevent the company from achieving its current measure of success. (You can also say that the liberal application of candy-flavored magical unicorn shavings of Ruby and Rails didn't prevent people from making an unholy mess.)

Time will tell if changing the development culture and refactoring the code and paying down all of the technical debt will help the company adapt and take advantages of new opportunities.

Time will tell if the codebase collapses under its own weight.

Time will tell if a competitor (and several exist!) will prove more agile and nimble because it has much better flexibility thanks, in part, to better code.

The whole situation reminds me of Facebook's HipHop virtual machine, where it's apparently cheaper and easier and faster and less risky to hire lots of developers to create and maintain a compatibility layer for the existing code than to rewrite existing code in a better language, or in a better fashion, or to improve it meaningfully.

I'm not suggesting that the only way to build a big business from nothing is to write bad code. I'm not suggesting that scaling to billions in revenue is the goal of all software-driven businesses. I'm not suggesting that you have to choose between test-driven development and business success.

In an ideal world, I can write the right software the first time. I can have sufficient test coverage to have complete confidence in the behavior of the code. I can deliver a feature which gets me paying customers in an afternoon without having to rewrite other parts of the code or taking shortcuts I know that I'll have to clean up when I get a spare weekend afternoon.

For a profession where some of us call ourselves "engineers", we certainly spend a lot of time discussing practical concerns as if the risks and rewards and limitations of the real world did not apply. (I wonder if the academic/practical divide between computer science and software development has some relationship to this.)

In the real world, I have to remind myself every day when I'm working on proof of concept code that proving my concept workable is more important than solidifying my code into well-tested and well-designed software and when I'm working on code I intend to keep that doing things as right as possible now will help me modify it to get it more right in the future.

None of this guarantees success. All of this benefits from the hard-won experiences I have from doing things the wrong way—and occasionally getting it very right. (In the real world, I spent part of the day finding and deploying a shim to turn SVG into VML for Internet Explorer 8 and earlier.)

Maybe Jim and Dave could have thrown out a couple of features and spent more time writing tests for the most valuable parts of their application. Maybe I'm wasting my time optimizing SQL queries for a search feature no one will ever use. Maybe John's company waited too long to untangle the admin and the user sides of their application.

If we're honest with ourselves, the best answer we can give is that time will tell. May we pay attention when it does.

A couple of comments on Simple Attribute-Based Template Exporting have asked for an example. I'll show off more of this code in my YAPC::NA 2012 and Open Source Bridge 2012 talk about how to write the wrong code (along with a handful of other techniques).

(I assume some knowledge of Template Toolkit (besides far too many books about finance, accounting, and investing, the Template Toolkit book is always within reach these days); I've set up a wrapper template which provides the standard look and feel of my application and I include/process other templates liberally. If you understand that much, you'll be able to follow along.)

One of the interesting templates in the system displays a list of chapters of a book in progress. A cron job rebuilds a static page from this template once a day. The template looks something much like:

[% USE Bootstrap -%]
[%- canonical_url = 'http://sitename.example.com/book/' _ link -%]

[%- add_og_properties({
    'fb:admins'      => '436500086365356',
    'og:title'       => title _ ' | sitename.example.com',
    'og:type'        => 'article',
    'og:image'       => 'http://static.sitename.example.com/images/logo.png',
    'og:url'         => canonical_url,
    'og:description' => text.chunk(300).0,
    'og:site_name'   => 'Sitename: site tag line',
   })
-%]
[%- add_meta(
    'pagetitle'     => title _ ' | sitename.example.com',
    'feed_url'      => 'http://static.sitename.example.com/book/atom.xml'
    'canonical_url' => canonical_url
) -%]

[% article_text = BLOCK -%]
<article>
<h2>[% title | html %]</h2>
<p>Published: <time datetime="[% date %]">[% nice_date %]</time></p>
[% text %]
</article>

<ul class="pager">
[%- IF prev -%]
    <li><a href="[% prev.link %].html">← [% prev.title | html %]</a></li>
[%- END -%]
    <li><a href="/onehourinvestor">index</a></li>
[%- IF next -%]
    <li><a href="[% next.link %].html">[% next.title | html %] →</a></li>
[%- END -%]
</ul>

[% INCLUDE 'components/social_links.tt', title => title %]
[%- END -%]

[%- row(
    maincontent( article_text ),
    sidebar(
        sideblock( process( 'components/cached/book_latest_chapters.tt' ) ),
        sideblock( process( 'components/cached/book_drafts.tt'          ) )
    )
) -%]

The emboldened lines are most important; they put all of the content produced or assembled by this template in the HTML structure the site needs. That is to say, everything on the site needs to fit into something I call a row. A row can contain multiple elements, such as maincontent and a sidebar, or fullcontent by itself with no sidebar. A sidebar can contain multiple sideblocks.

(You can ignore the other functions; they put metadata in the right places to pass to wrapper templates.)

Within my template plugin (called Bootstrap), each of these elements is a simple Perl function which takes one or more arguments and interpolates it into some HTML:

sub row :Export
{
    return <<END_HTML;
<div class="row">
    @_
</div>
END_HTML
}

sub sidebar :Export
{
    return <<END_HTML;
<div class="span4">
    @_
</div>
END_HTML
}

(I initially tried to write these functions as templates within Template Toolkit itself, but there comes a point at which you want a real language. That point came very early for me.)

I lose no love over the varname = BLOCK pattern necessary to populate variables to pass to these plugin functions, but it works for now. In some of my templates—usually those with lots of text I might end up changing later—I extract that text into a separate template under components/content/ to make it easy to edit. (This idea came up during a client project where the client wanted to edit the legal clickthrough arrangement after users create accounts. I didn't want lawyers or anyone to have the ability to mess up the templating language, so I said "Edit this single file as plain HTML and you'll be fine." It worked great.)

While my programmer brain says "This is ugly, and you're a horrible person for committing this hack upon the world—you're calling Perl from your template system to generate HTML you're stuffing into a template and that puts your presentation elements in Perl code, you awful human being!", it keeps the presentation code in a single place where I can update it infrequently (being that I don't change the layout of the site dramatically) without having to change the divs and classes of multiple templates.

I'm not arguing that this technique as expressed here is right. It's probably not optimal; there may be easier approaches to achieve the same effects.

I am saying that this currently works very well for me. I'm not typing the same HTML over and over and over again, and I can tweak it much more easily than I did before when I was refining the look and feel. In fact, I've even forgotten the exact details of the layout, from the HTML/CSS point of view, and now think only in terms of rows, maincontent, and sidebars.

Working abstractions are very nice.

If you're like me and your design skills are sufficient to modify something decent to look nice but insufficient to create something from first principles, you can do a lot worse than to play with Twitter Bootstrap for your next web site.

I've used it successfully for a few projects and it's been great.

It's a lot better now that I've written my own silly little Template Toolkit plugin to reduce the need for writing lots of repetitive HTML in my templates. (It's like Haml but less ugly and more Perlish and easier to extend.)

Writing a TT2 plugin is relatively easy. Of course I do it the wrong way; when you initialize your plugin, you have the ability to manipulate TT2's stash. This is the data structure representing the variables in scope in your templates. Where a well-behaved template should use object methods to perform its operations, my code stuffs function references in the stash. Here's the relevant code:

sub new
{
    my ($class, $context, @params) = @_;

    $class->add_functions( $context );

    return $class->SUPER::new( $context, @params );
}

sub add_functions
{
    my ($class, $context) = @_;
    my $stash             = $context->stash;

    while (my ($name, $ref) = each %exports)
    {
        $stash->set( $name, $ref );
    }

    $stash->set( process => sub { $context->process( @_ ) } );
}

I'll fix this eventually, but the process of making this work was interesting.

In my first attempt (see Write the Wrong Code First for the justification), I'd write the function I needed, like row(), which creates a new Bootstrap row or maincontent() which creates the main content area of the page. Then I'd add that function to the %exports hash and everything would work.

After the sixth function, keeping that list up to date was tedious. Then I kept forgetting it. After all, any time you have to update the same data in two places, you're doing something wrong.

Now the code looks more like:

sub row :Export
{
    return <<END_HTML;
<div class="row">
    @_
</div>
END_HTML
}

... with a single code attribute marking those functions which I want to stuff into the template stash. I've used Attribute::Handlers before, but I always end up reading the manual and playing with things to get them to work correctly. (Something about the way you have to write another package and inherit from it to get your attributes to work correctly always confuses me.)

My second attempt lasted no longer than ten minutes. I switched to Attribute::Lexical. This is almost as trivial to use as to explain:

use Attribute::Lexical 'CODE:Export' => \&export_code;

Whenever any function has the :Export attribute, Perl wil lcall my export_code() function:

my %exports;

sub export_code
{
    my $referent = shift;
    my $name     = Sub::Identify::sub_name( $referent );

    return unless $name;
    $exports{$name} = $referent;
}

The first argument to this function is a reference to the exported function. I use Sub::Identify to get the name of the function reference. (That wouldn't work for anonymous functions, but I can control that here.) Then I store the name of the function and the function reference in a hash.

It took as long to write as it does to explain.

A lot of people dislike the use of attributes. Used poorly, they create weird couplings and plenty of action at a distance. Attribute::Handlers can be confusing.

I like to think that I'm using attributes well here (even if I'm abusing TT2 more than a little), and that they've simplified my code so that I can avoid repeating myself and performing manual busywork that I'm likely to forget. Even better, the code to use them isn't magical at all: it's all hidden behind the pleasant interfaces of Attribute::Lexical and Sub::Identify.

Write the Wrong Code First

| 5 Comments

I rewrite code often.

If I were a better programmer, designer, or businessman, I would rewrite my code much less frequently—but I get things wrong about as often as I get them right. Even with years of practical experience, software's still too difficult to predict with any degree of accuracy.

As a case in point, I've been revising some financial software in the past week. In reviewing the calculations, I found a way to simplify them dramatically. Even better, these simplifications allow me to simplify the interface and user experience.

That means rewriting a lot of code. That means throwing out code and revising the storage model and making a lot of changes.

I'm fortunate to have a good test suite that runs in 15 to 20 seconds and lets me know that everything I most need to work continues to work. That's a lot of confidence. People who like to talk about test-driven development and refactoring tout this as one of the benefits of well-tested software: you can refactor with confidence.

I'm not refactoring. I'm throwing away parts of this application and adding others. I'm changing how it behaves. Even though my test suite helps, that's not refactoring.

As part of this project, I've added an SVG graph to a class of web pages. I started by creating the SVG in Inkscape. Then I exported it as plain SVG. Then I made a template for that SVG to include from the page template.

That was still the example SVG with sample data, still the proof of concept.

I then extracted one piece of hard-coded data and made it a templated value. One. Everything still worked. Then I extracted the second piece of data and so on.

It's one step at a time. It's one change at a time. I'm using Git, so I could even commit after every single change, no matter that it's a few characters or even merely changing the color of a bar in the graph. I can work in steps as small and discrete as possible, and then squash them into one big commit or rewrite them into functional units, or do whatever I want with them.

That's the same principle behind test-driven development (or test-driven design or even behavior-driven development, if you need to hang a new name on the same idea). Do one thing at a time. Make your code do a little more of what it needs to do. Prove that it all hangs together, that it all works, that it does what you intended.

Then clean up a little bit. That's refactoring, in your code and in your tests. That's rebasing in Git.

Sure, I wish I could know exactly what I needed to write from the start. I wish sometimes that programming were mere transcription of the voice of an ephemeral muse (though I find it difficult to imagine a muse dictating Perl or JavaScript or Haskell or J aloud). I wish I were the Beethoven of programming (without the mercurial temperament and the hearing loss).

Usually I don't get things right from the start. Fortunately, a little discipline and the willingness to work in small steps, to erect and replace the scaffolding as I go, and I usually get a lot closer to the right code than if I guessed.

Maybe that means I've thrown out more code than I've written. (It's satisfying to delete unused code, after all.) Maybe any project which starts as a proof of concept, then has to pivot in other directions to do what it's always needed to do always becomes a Ship of Theseus.

I'm okay with that. It's more important to me to create something useful and then make it right than to wait on getting it right before other people can find value in it. I may never write the right code from the start, but I believe I can make almost-right code much, much more right, with discipline and care and feedback.

One of my projects performs a lot of web scraping. Once every n units of time (where n can be days or weeks), a batch process fetches several web pages and extracts information from them. It's a problem solved very well.

I designed this system around the idea of a pipeline of related processes, where each component is as independent and idempotent as possible. This has positives and negatives; it's an abstraction like any other.

I initially wrote the "fetch remote web page" and "analyze data from that page" as a single step, because I thought "analyze" was the main goal and "fetch" was a dependent task. I separated them a couple of weeks ago to simplify the system: analysis now expects data to be there, while fetching can be parallel on a single or across multiple machines. (Testing the analysis step is also much easier because feeding in dummy data is now trivial.)

I use the filesystem as a cache for these fetched files. That's easy to manage. I modified the role I use to grab data for the analysis stage to look in the cache first, then fall back to a network request. That was easy too. The get_formatted_data_for_analysis() method looked something like:

sub get_formatted_data_for_analysis
{
    my ($self, $type, $key) = @_;

    my $cached_path         = $self->get_cached_path( $type, $key );
    if (-e $cached_path)
    {
        my $text = read_file( $cached_path );
        return $self->formatter->format_string( $text ) if $text;
    }

    return $self->formatter->format_string( $self->fetch_by_url( $type, $key ) );
}

I thought I was done. This trivial caching layer took five minutes to write and gave my project a lot of flexibility.

I thought this would speed up the processing stage, because I was able to make the fetching stage embarrassingly parallel so that more than one fetch could block on network IO simultaneously. My rough benchmark didn't show any speed improvement, but it was fast enough, so I moved on.

On Friday I decided to profile the slowest stage of the application with Devel::NYTProf. The slowest stage was the processing stage. I isolated it so that it performed no network fetching. It was still slow.

One of the formatter modules used to extract data from web pages is HTML::FormatText::Lynx. It allows me to run lynx --dump to strip out all of the HTML and other formatting of a document. The formatter allows you to pass in the name of a file or the contents of a file as a string.

For some reason, most of the time in the processing stage in the profile was spent in file IO. That wasn't too surprising; these aren't all small files and there may be thousands of them. I dug deeper.

Most of the time in the processing stage in the profile was spent in reading the files in my method and reading files in the formatter—reading files, even though I was passing the contents of those files to the formatter as strings.

I poked around at a few other things, but came back to the source code of the formatter. A comment in HTML::FormatExternal says:

format_string() takes the easy approach of putting the string in a temp file and letting format_file() do the real work. The formatter programs can generally read stdin and write stdout, so could do that with select() to simultaneously write and read back.

In other words, all of the work I was doing to read in files was busy work, duplicating what the formatter was about to do anyway. (Okay, I stared at the code for a couple of minutes, thinking about various approaches of rewriting it and submitting a patch or monkey patching it. Then I turned lazier and wiser.) I rewrote my code:

sub get_formatted_data_for_analysis
{
    my ($self, $type, $key) = @_;

    my $cached_path         = $self->get_cached_path( $type, $key );
    return $self->formatter->format_file( $cached_path ) if -e $cached_path;

    return $self->formatter->format_text( $self->fetch_by_url( $type, $key ) );
}

The result was a 25% performance improvement.

Three things jumped out at me in this process. First, how nice is it to have a working tool like NYTProf and a community that distributes source code, so that I could examine the whole stack of my application to isolate performance problems? Second, how interesting that an assumption and an admitted shortcut in a dependency could have such an effect on my own code. Third, how much more I like my new code with all of the file handling gone; pushing that responsibility elsewhere is a nice simplification without the performance improvement.

Perhaps the two tools I miss most from my C programming days are Valgrind/Callgrind and KCachegrind, but NYTProf goes a long way toward filling that gap. Besides, I'm at least 20 times more productive with a language like Perl.

Find recent content on the main index or look in the archives to find all content.

Modern Perl: The Book


The best Perl Programmers read Modern Perl: The Book.

Read Modern Perl online for free!

Recent Comments

  • chromatic: Unfortunately I can no longer ignore IE 8. Fortunately, I read more
  • https://me.yahoo.com/a/evZh.8gAt5qa1xDbY_dE.iSYdbI-#2dbce: Hey, As one of the people asking for a code read more
  • barefootcoder.myopenid.com: Interesting. I have to say I still prefer the interface read more
  • http://openid.anonymity.com/2a3n8o: Template::Semantic gives a good separation of html from perl code, read more
  • asknet999.myopenid.com: I completely agree with your post. Most of the software read more
  • https://me.yahoo.com/a/evZh.8gAt5qa1xDbY_dE.iSYdbI-#2dbce: I'd like to see that as well. I've also been read more
  • robmueller.myopenid.com: I did something similar for our web application to mark read more
  • autarch.urth.org: It'd be interesting to see how a template looks uses read more
  • Aristotle Pagaltzis: Your time writing a reply was wasted, you fell for read more
  • chromatic: There's no real namespace distinction between keywords and user-defined functions. read more

Recent Assets

  • KO.png
  • butteraptor.png

Categories

Pages

OpenID accepted here Learn more about OpenID
Powered by Movable Type 4.23-en