September 2013 Archives

Functions Shouldn't be Methods, Yet Another Reminder

By chromatic on September 24, 2013 6:00 AM

This just cost me a few minutes of debugging:

use Moose;
use File::Temp 'tempdir';

has 'temp_dir', is => 'ro', lazy => 1, builder => '_build_temp_dir';

sub _build_temp_dir {
    tempdir( CLEANUP => 1 );
}

sub do_something {
    my $self = shift;
    my $dir  = $self->tempdir;

    ...
}

Yes, it's my fault for having two names in a single namespace which are too close together and for getting too clever with directory handling (I wish there were a chdir that were lexically scoped), but this really shouldn't be a problem in 2013.

Yes, it's my fault for not using something like namespace::autoclean, but this probably shouldn't be a problem in 2013.

I look forward to a world where Perl has a proper MOP and method lookup looks up only methods. In the meantime, I'm going to keep reminding myself that importing functions into classes might end in tears.

Aliasing Short Names for Constructors

By chromatic on September 14, 2013 7:15 PM

If you work on a large project, your code probably has a top-level namespace. If it's a large project, you probably have several secondary namespaces. I have seen this pattern time and again where you end up with:

MyProject
- ::Controller
- ::Model
- ::Report
- ::Role
- ::Template
- ::Utility

... and you get the idea. It's a little tedious typing out the full path to all of those modules when it's easy to assume what they are. Hold that thought for a moment.

One of the few things I like in Python better than Perl is that you create an object by calling a constructor function. In other words, if you have an IceCreamStore class, you get an object by writing store = IceCreamStore( ... );. I don't like it because it's obvious (Perl wins there) but I like it because it's shorter. (The other thing in Python I like better than Perl is built in support for iterators.)

You can get a lot closer in Perl using aliased, but it doesn't give you constructor functions. Hold that thought.

One of the drawbacks of the Python "a class is just a hash table of attributes and ~~functions~~methods" design that Perl borrowed is that any function imported into a class's namespace(to make your code shorter) might be available as a method on objects of that class. Fortunately, you can use something like namespace::autoclean in Perl to unimport imported symbols after they've done their job. This works because, once Perl's parser has seen a function, the optree it builds refers to the proper function's storage location and the function's name association in the namespace can go away without changing the optree at all. Hold that thought.

While I was walking the dogs tonight, I figured out how to combine all of these things. This demo code works:

package MyDemo;

use classalias Class => qw( Foo Bar Baz );

sub get_all {
    return Foo(), Bar(), Baz();
}

1;

... and you can call it with:

#!/usr/bin/env perl

use Modern::Perl;

use MyDemo;

say $_->number for MyDemo::get_all;
say MyDemo::Foo();

Yes, the names are silly and bad, but look at what the code does. MyDemo uses three other modules, Class::Foo, Class::Bar, and Class::Baz. Within MyDemo it can construct new instances of each of those classes by calling functions named Foo(), Bar(), and Baz().

Outside of MyDemo, those functions do not exist.

The code is a little fragile because it's merely a proof of concept, but I'll post it in a few days.

Would I use this? Maybe. It needs more thought, but it only took a few minutes to write, which is a small investment for something that could save me lots of time in the future.

Why You Should Write Programs to Write Programs to Write Programs For You

By chromatic on September 3, 2013 6:00 AM

I don't know what your work environment looks like, but mine has a lot of spreadsheets. Spreadsheets seem to be the one user interface that everyone I need to deal with understands. These are business users, IT departments, investors, investigators, and a small army of researchers. If we need data from these people, we'll probably get it in a spreadsheet. If we need to give them data, they probably want it in a spreadsheet.

I've long been grateful for Spreadsheet::ParseExcel and Spreadsheet::WriteExcel. One of the most useful pieces of code I've written this year is a simple iterator which uses either a spreadsheet or CSV file as the data source and lets you read the data row by row, giving you a hash reference keyed on column names. If it weren't for ParseExcel and Text::CSV and friends, my job would be much more difficult.

Then my corresponding business person asked me to create a new report from an existing template. I thought seriously about how to reproduce the form of the report with WriteExcel for about 30 seconds, and then was doubly grateful for the existence of Spreadsheet::ParseExcel::SaveParser (even if its API is slightly different from that of WriteExcel—I will happily deal with that for the sake of not having to write it myself).

Then I realized that I had to write a lot of code to populate each individual cell.

The report calculated various values grouped by country. In other words, for each country the business user cares about, I had to count records which matched multiple criteria. She also wanted the ability to change the countries or rearrange them.

Here's where choosing the right data structure is important. I wanted to write code like this:

B2: count_all_people
B3: count_all_places
B4: count_all_things

... where the first token is the address of the cell in the spreadsheet and the second token is a method to get the value for that cell. That was easy enough to make into a data structure:

my @updates = (
    [ B2 => 'count_all_people' ],
    [ B3 => 'count_all_places' ],
    [ B4 => 'count_all_things' ],
);

... which I could iterate through with:

for my $update (@updates) {
    my ($cell, $method, @args) = @$update;
    my ($col, $row)            = cell_to_pos( $cell );

    my $value                  = $self->$method( @args );
    my $format                 = $sheet->get_cell( $row, $col )->{FormatNo};
    $sheet->AddCell( $row, $col, $value, $format );
}

... which reads pretty well.

Then I had to figure out how to look up countries by name:

[ G14 => 'count_people', { country => 'Angora' } ]

That's a little fragile, though; it hard codes both the cell where the value should go and the name of the associated country. The first time my colleague revised her spreadsheet to add a country, I was glad to find a better approach:

my $country = sub {
    my $cell        = shift;
    my ($col, $row) = cell_to_pos( $cell );

    return ( country => $sheet->get_cell( $row, $col )->value );
};

This function takes a cell's location, looks up the value of that cell, and returns a key/value pair of country and country name. All that I need to know now is the range of cells which contain country names (sources) and places to store calculated values (sinks):

map {
    [ "G$_" => 'count_people', { $country->( "E$_" ) } ],
    [ "H$_" => 'count_people', { $country->( "E$_" ), contacted => 1 },
    ],
} 14 .. 35

That map expression builds several entries in my data structure which refer to countries in the spreadsheet and look up the right values. As long as the range is correct, the spreadsheet will have the correct associations between countries and reported values.

While I admit the map expression is a lot more difficult to read than the data it builds would be, it's much easier to maintain. This is a tradeoff I'd make any time.

Keep this in mind, however: this is effectively a little programming language. Yes, it's just a data structure, but it's a data structure that controls the control flow of the language. It uses a higher order function, $country, to generate some of the values to this data structure (writing a little program) as well as a builtin operator (map) to generate more of the program. The little runloop which processes this data structure uses dynamic dispatch to produce the necessary data—and what you don't see is that the methods called use SQL::Abstract to build queries dynamically.

That's the reason you ought to study a higher order language like a Lisp or a Scheme, and the reason you need to know how compilers work. That's also the reason you deserve to understand various kinds of data structures, so that you can organize your programs in a such a way that doing what you want to do is the natural process of traversing a sensible data structure.

When you reach this level of problem, sometimes the solution isn't just writing a brute force list of steps the computer needs to execute from the top to the bottom. Sometimes the better solution is to describe your problem in terms of the data you have and the data you need and let the computer figure out how to do it, even if that means writing a program to write a program to write yet another program for you.

« August 2013 | Main Index | Archives | October 2013 »

September 2013 Archives

Functions Shouldn't be Methods, Yet Another Reminder

Aliasing Short Names for Constructors

Why You Should Write Programs to Write Programs to Write Programs For You

Modern Perl: The Book

Categories

Monthly Archives

Pages

About this Archive