Don't Parse That String!

| 6 Comments

Defensive programmers anticipate what might go wrong. Robust code handles the unexpected, partly by minimizing the surface area of potential problems. The fewer things that can go wrong, the fewer things that will go wrong. (Things will still go wrong, but you can write safer code if you're clever.)

Yuval Kogman asked Are we ready to ditch string errors? I am; there's a general principle of API design beyond his question.

One problem with die "Some error!" is how to identify what error that represents—not to a programmer or user, who ostensibly speaks enough English and problem domain jargon to have some idea of what the error means—but the rest of the program. How does your code catch this error and distinguish it from some other type of error? Can you determine which of the two you can handle and which you must delegate?

Break out split or the regular expression engine and prepare to write heuristics which guess, and woe to you if someone someday internationalizes your error messages or runs all of your exceptions through a logging mechanism which changes their formatting slightly or....

The problem is that you can't take advantage of the structure of the exception data because it's not present in the string. The same goes for DBI's connection strings:

my $dbh = DBI->connect( 'dbi:DriverName:database=database_name;host=hostname;port=port' );

As the documentation suggests in the very next sentence:

There is no standard for the text following the driver name. Each driver is free to use whatever syntax it wants.

Compare this to a keyword argument form:

>my $dbh = DBI->connect(
    driver   => 'DriverName',
    database => 'database_name',
    host     => 'hostname',
    port     => 'port',
    extra    => 'arguments',
);

This has several advantages. The method doesn't have to guess (or parse) the string. The layout and vertical alignment makes the keyword form easier to read and to modify. DBDs can decorate and augment this argument list without parsing and recreating a string. Verification and default arguments are much easier.

The same argument goes for using a module such as File::stat instead of parsing the output of `ls -l filename`.

The same argument goes for... you get the point. It's far too easy to unfold the regex widget from the swiss-army chainsaw when a little bit of caution decomposing data into structured data makes your programs safer, easier to use, more flexible, and more robust.

(I consider sometimes how a language would look if it had only keyword arguments and how you could optimize them with immutable, internable strings and cached call sites and a zero-copy register allocation mechanism, but I made it as far as writing a self-hosting garbage collector before I had real work to do.)

6 Comments

Surprisingly (or perhaps not so surprisingly), opinions vary…

http://perlmonks.org/?node_id=835894

Minor note: DSNs as strings have one simple advantage over hash refs or argument pairs; they are easy to specify on the command line.

I'm always amazed at how quickly many programmers will reach for "a string that I'll parse later" as a first-order solution to a problem. Insanity!

Dan Bernstein rightly lists avoiding parsing among his fundamental rules: http://cr.yp.to/qmail/guarantee.html

Agreed, but command line argument pairs help add structure to data.

Not that I particularly disagree with what you've said and I'm sure you only picked DBI as an example but as

There is no standard for the text following the driver name. Each driver is free to use whatever syntax it wants

the string passed to DBI is not necessarily parsed as keyword=value pairs, there is nothing wrong with 'dbi:driver:only;an;example;and who said semicolon is a separator'. In fact some DBD's pass the string after 'dbi:driver' directly on to the underlying database engine/API without touching it.

That said, I do agree your example replacement is clearer and may not need to be parsed etc, which afterall was the point of your post.

Right. I suspect that API has more to do with the connection strings of Oracle and Sybase than any deliberate design on the part of the DBI or DBD authors. Keyword arguments were always possible, but they weren't the most obvious API design.

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide

Categories

Pages

About this Entry

This page contains a single entry by chromatic published on July 7, 2010 12:50 PM.

Hire AND Train was the previous entry in this blog.

The Urge to Brag is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.


Powered by the Perl programming language

what is programming?