October 2010 Archives

How not to Handle Exceptions

By chromatic on October 28, 2010 12:55 PM | 2 Comments

A client project processes a large volume of data. Freshness is more important than completeness, but avoiding data loss is still important. So is avoiding data corruption; it's better to have to perform a unit of work again than to save incomplete or inaccurate information in the database due to a race condition.

The backend storage mechanism is a relational database with transactions enabled for all write operations. In the (unlikely but possible) case that a transaction fails due to the inability to acquire the correct lock for a unit of work, the code retries the transaction.

Perl's DBI and the KiokuDB persistence layer reveal failed transactions by throwing exceptions. This is all well and good—a transaction error is an exceptional condition that should interrupt normal code flow—except that exceptions as unstructured string data are difficult to use.

It's difficult enough to determine whether an exception occurred from the database or another layer. I had one codepath in which Perl 5 threw a runtime exception due to a missing method (and fortunately testing caught this without data loss, but even so what a hassle).

To my knowledge, there's no easy way to set up an exception handler which catches only exceptions thrown from certain places (much less only exceptions of certain types). In this application, I do very much care about retrying transactions, but if the code has an error I want that exception to propagate to the top level and end the program.

The best I can do at the moment is to perform a regex match against the string of the exception text and hope that testing and careful thought will catch any changes to avoid false negatives and false positives. Granted, the proper place for these errors is most likely in the KiokuDB layer, as committing to use that backend system offers a single point of consistency and abstraction for such details—but that brings up a wider question worth considering in its own post.

Reinventing the Axle

By chromatic on October 25, 2010 11:58 AM | 4 Comments

... or A Modest Proposal for Dynamic Language Bindings

I've worked on a few shared library bindings for various dynamic languages: several libraries for Parrot, a few for Perl 5, and one for Ruby. I've embedded Perl 5 and I've embedded Parrot. (I figured out how to get Perl 5's reference counting working correctly with Parrot's "true" GC and how to get Parrot's GC working when embedded in Ruby.)

I even wrote a proof of concept silly port of Parrot's foreign function interface to Perl 5 before the Python folks adopted the much better ctypes (and can't wait to use ctypes for Perl 5).

All of this reveals to me that there's something rotten about writing simple bindings to shared libraries from dynamic languages. It's mostly tedious, uninteresting work with far too many chances of bugs and far too many repetitive details. You'd think computers would be good at solving both problems.

I have generalized from my psyche-scarring experiences two fundamental assumptions:

C (and specifically C headers) are a terrible layer of interoperability because they cannot express some of the most important details (Does this function acquire a shared resource? Whose responsibility is it to manage the lifespan of that resource?) and they obscure the clarity of intent through the use of abstractions such as C declarations and macros.
Requiring end users to install a full development environment along with the development headers for any library to which they want to install your bindings is a recipe for madness on the part of installers and soul-crushing despair on your part, as you try to figure out precisely which version of OpenGL is available on which version of Windows with which specific release of a given video card and oh goodness no, please do not tell me you just upgraded your Cygwin.

In other words, parsing headers at the configuration time of a CPAN module which binds to, for example, libcurl, is madness, and we should stop.

Assume that ctypes for Perl 5 exists very soon in a form in which you can rely on its presence on a modern Perl 5 installation. Assume that if you prefer Python or Lua or Ruby or Haskell or Factor or even some form of Common Lisp not tightly bound to the JVM or the CLR that you have a similar library which knows how to translate from your language's calling conventions to the C ABI to which the library's exported functions conform and that the type mapping problem is solved for 80% of the cases.

Now you need some mechanism to identify the symbols exported from the shared library to generate the appropriate thunks.

I've tried (and failed) to use Swig, and I blame myself more than anyone else for that—but Swig is the wrong answer. Parsing C headers is the wrong answer in 2010 and it was the wrong answer 20 years ago. C headers do not provide the right information in the right form. Effectively you have to have a bug-free C preprocessor to expand headers into literal C source code and then hope that your C parser will identify the correct information you need.

What's the right level of abstraction? That depends on the information a thunk library such as ctypes needs to know:

The name of an exported symbol
Its input and output types (and in specific, bit width, signedness, any varargs)
Constness of pointers or expected modification of out parameters
Exceptional conditions such as control flow modifications through longjmp
Error handling, such as setting errno or special return values
Resource handling, such as a function which returns a malloced value but expects you to free it yourself (or some combination)

... and probably more.

I've set aside the concept of opaque pointers versus raw structs, because that's another rathole full of platform-specific concerns (and besides that, any library which does not expose only opaque pointers to external uses is in a state of denial of reality and deserves a very good refactoring), but you probably already get the idea.

Wouldn't it be nice if shared libraries could provide some sort of machine-parseable, semantics-preserving, declarative (that is, no cpp necessary!) file which all of us poor users could parse once with our thunk generators to produce bare-bones, no sugar added interfaces to these wonderful libraries, then get on with the interesting work such as building Pygame and SDL_perl in wonderfully Pythonic and Perlish ways instead of manually reading SDL_video.h and figuring out how to map all of that implicit information into XS ourselves?

I'm not asking for another section crammed into ELF files and I'm not suggesting that the fine people behind libxslt need to compile a manual file of machine-extractable information themselves—if we had a nice format all of our dynamic languages could understand, anyone could make this file once for any API/ABI version of the library and we could all share for a change. Wouldn't that be nice?

(and yes, I'm aware of CORBA and COM and their IDLs, but the existence of Monopoly money by no means renders a $20 useless at the grocery store)

Closures, Late Binding, and Abstractions

By chromatic on October 21, 2010 12:33 PM

One of my client projects has suffered from running code too quickly (go Perl!) and too much in parallel (go Unix processes!) and has kept me tuning the transactional model of the backend storage.

I'm not worried about inconsistencies, but rather detecting and avoiding lock contention where possible, rescheduling transactions where lock contention does occur, and above all, wrapping transactions in the smallest units possible.

The fantastic (but not for all projects) KiokuDB has been very useful for this project. If you use it with a transactional backend, it provides a DBI-inspired method to invoke a function reference in its own transaction:

$dir->txn_do( sub { $dir->delete( @args ) } );

I was glad of the ability to pass around closures in a lightweight manner; delaying computation until KiokuDB has set up the transaction is very useful. Even so, I was less pleased with all of the syntactic noise littering my code in several places.

That's because I wasn't taking full advantage of the abstraction possibilities of late binding.

Now I have instead:

use Try::Tiny;

sub do_txn
{
    my ($self, $method, @args) = @_;

    my $dir    = $self->dir;
    my $sub    = sub { $dir->$method( @args ) };

    while ( ... )
    {
        try   { $dir->txn_do( $sub ) }
        catch { ... }
    }

    ...
}

... and I can call it with:

$self->do_txn( add    => $new_obj  );
$self->do_txn( delete => @args     );
$self->do_txn( update => $invocant );

... to remove visual clutter from other parts of the code. Better yet, all of the retrying semantics are in one place, and I can add logging or tuning there.

I hoisted the creation of the closure passed to txn_do() out of the while loop for two reasons. Primarily, I believe doing so makes the code within the loop clearer. It's also slightly more efficient (slightly) to create the closure once than on every trip through the loop. (If efficiency were of greater concern—and lock contention is much more troublesome here—I could pre-resolve the method and bind to the function reference representing the candidate method first instead of the name of the method, but that would add at least one line of code and possibly a few more for error checking, and it's not worthwhile yet.)

Despite Perl 5's support for pervasive and relatively lightweight closures, sometimes they're not the best abstraction to use if your primary concern is code clarity. I believe the resulting code is much clearer (even if do_txn() isn't the right name).

When You Know Most about What You Need Most

By chromatic on October 18, 2010 11:50 AM

I had every intention of sending Modern Perl: the book to the printer last week, but a funny thing happened along the way when I proofread the index.

Perhaps I've been writing software too long (and in the technology world before that); sometimes I look at problems as if they were software problems. I've written several books before. I've edited several books too. I understand publishing (which is a good thing to have if you're a partner at a publishing company).

I knew that a good index makes a good book great just as a poor index drags down a good book. I planned from the start to sprinkle index tags liberally through the manuscript as I wrote it. What better time to indicate which topics are most important than when I mention them?

That worked—at least better than trying to add the relevant index tags after the fact.

Then I built the index for the first time and realized that consistency of index is very, very important. By the time I'd proofread the index once, I knew far more about what I wanted to index (and, more important, how) than I knew when I finished writing the manuscript.

The second 80% of the work went more quickly than the first 80%.

Yet because I've spent so long writing software and exploring project management, I couldn't help but remind myself of that always-important project management mantra: you know the most about what you need only at the point when you most need it.

That's one reason we at Onyx Neon have invested so heavily in making it possible to produce a new book from a draft manuscript within a minute or two (including the index), and that's one reason I care so much about iterative development. I like what I see in Rakudo, with it's feedback-driven prioritization for Rakudo Star development and projects such as Dist::Zilla which, even though it moves quickly, has evolved quickly to a very usable, very powerful system.

The more you know, the better decisions you make. It works for books as well as it works for software development.

(For a longer treatment of the idea of "the last responsible moment", see my colleague's A Tale of Two Vacations.)

Due CREDITs

By chromatic on October 15, 2010 5:06 PM

I'm performing the final proofreading of the Modern Perl book today. Its 275 pages of goodness represent a lot of work on my part, but I must give credit to several dozen people who've asked good questions, found typos, suggested rephrasings, told me that my prose reads more like a novel than a dry technical reference, and otherwise helped document one good way to write Perl 5 code in 2010.

I'll keep the Modern Perl Github repository up and open for the lifespan of the book (Perl 5.14 needs good external documentation; I'm looking forward to using the package {} syntax). You can dig around in the CREDITS file there to find the other people who've made the book what it is, but that's not nearly enough thanks.

I can highlight several people such as Yuval Kogman, Chas. Owens, and Alex Scott-Johns who've put far more effort into the book than I could have asked. I can mention the work of harleypig, John McNamara, and Jess Robinson who've made it almost trivial to produce epub files for all of you with electronic readers (and there will be an epub file very soon).

Perhaps the best way for me to thank all contributors sincerely is to say this: anyone named in this list has my most sincere and heartfelt thanks—and my recommendation to employers and clients alike: these people have demonstrated an interest in and an affinity for Perl 5. You would do well to work with them.

Thank you to:

John SJ Anderson, Peter Aronoff, Lee Aylward, Alex Balhatchet, Ævar Arnfjörð Bjarmason, Matthias Bloch, John Bokma, Vasily Chekalkin, Dmitry Chestnykh, E. Choroba, Paulo Custodio, Felipe, Shlomi Fish, Jeremiah Foster, Mark Fowler, John Gabriele, Andrew Grangaard, Bruce Gray, Ask Bjørn Hansen, Tim Heaney, Robert Hicks, Michael Hind, Mark Hindess, Yary Hluchan, Mike Huffman, Curtis Jewell, Mohammed Arafat Kamaal, James E Keenan, Yuval Kogman, Jan Krynicky, Jeff Lavallee, Moritz Lenz, Jean-Baptiste Mazon, Josh McAdams, Gareth McCaughan, John McNamara, Shawn M Moore, Alex Muntada, Carl Mäsak, Chris Niswander, Nelo Onyiah, Chas. Owens, ww from PerlMonks, Jess Robinson, Gabrielle Roth, Andrew Savige, Lorne Schachter, Dan Scott, Alexander Scott-Johns, Phillip Smith, Christopher E. Stith, Mark A. Stratman, Bryan Summersett, Audrey Tang, Scott Thomson, Sam Vilain, Larry Wall, Colin Wetherbee, Frank Wiegand, Doug Wilson, Sawyer X, David Yingling, Marko Zagozen, harleypig, hbm, and sunnavy.

Any remaining errors in the book are the fault of me and my publisher.

Certification or Delivery

By chromatic on October 13, 2010 9:45 AM

I can judge your efficacy as a software developer (or a software team) with a handful of questions:

Do you release working software?
... on a regular schedule?
... and does it improve over time?
... and delight your customers?

These simple questions assume a lot of planning and delivery. You should have fewer bugs, as time goes on. You should deliver features your customer wants. The quality of the software should improve, as should its maintainability.

I've said nothing about what language you should use, what paradigm you should use, how to organize your team, what libraries or development environments you should use, the one true coding standard, or even your development process. I judge your project by its results.

Does it do what you say it does? Is it reliable? Will it continue to do that next month? Will it do more (or better or faster or with less memory)?

I think of this today after reading Tobias Mayer's The Scrum Compliance, in which he suggests that Certification for Its Own Sake is a problem of the Scrum Alliance. I also think of flipping pages in the original XP book eleven years ago and realizing that there was a life beyond change requests and testing plans and lobbying to get a feature into the next twice-a-year-inevitably-slipping release.

I don't care if you have or don't have a piece of paper and can regurgitate specific facts about the memory model of a specific programming language, because I'll likely have to look up the answer myself or at least write a ten-line test program to satisfy my curiousity. I don't care if you don't spring for your $50 annual membership in a professional society intended to get you past the gatekeepers of HR. I don't care if you know Kent Beck or Ken Schwaber personally (though if Ward Cunningham vouches for you personally, you get bonus points).

I care that you can demonstrate the abilities necessary to be part of a team devoted to delivering great software on a regular, predictable schedule. You can learn Moose (if you're willing) and you can adapt to our coding style (if you're not a primadonna) and you can even discover that pair programming improves our ability to write great software (if you can get over any preconceived notions of "watching someone else type").

That's why I wanted so badly for Perl 5 to get a reliable monthly release cadence and that's why I don't worry about the present or future of Parrot and Perl 6. I'm sure it's possible to write great software without regular releases and iterative development, but I'm certain that only healthy projects can sustain that schedule over a long term.

People who can work in those enviroments—people who help make such projects possible—impress me far more than people who can only brandish a piece of paper certifying that they sat in a chair for three days.

Version Dependencies: Don't Guess!

By chromatic on October 7, 2010 3:05 PM | 3 Comments

I'm glad to see Ævar's The CPAN client version-less dependency problem, because it discusses a real problem. In the absence of specific information about dependencies, what should installers do?

Unfortunately, that's the wrong question.

Have you ever read code which performs user input validation deep in its guts, way down in code which has layers of insulation between user input? I have. I take no small pride in removing this validation code and putting it where it belongs: as close to the input as possible. This has at least two benefits. First, it allows for the possibility of reporting user errors in the view and dispatch logic, where it belongs. Second, it removes clutter from code which can document its expectations appropriately.

(I realize that high security applications may need extra paranoia and I submit that you should have effective testing of the interfaces between components to satisfy you that unrealistic data never enters the application, but it's still a general rule.)

If you're confident that you've dealt with all sources of errors before a certain point, don't worry about them after that point.

With that all said, perhaps the client side of the CPAN installation is the wrong place to handle these dependencies. After all, the developer of the code has presumably installed dependencies locally and has run tests against them. Why shouldn't Module::Build or ExtUtils::MakeMaker check the installed version when bundling the distribution and include that as a recommended minimum version in the META.yml file?

It's no worse than "Install whatever version you want" and it at least has the data point that someone had that version working. It also requires no changes to the installers with fragile or odd heuristics.

Structured Data and Knowing versus Guessing

By chromatic on October 4, 2010 12:20 PM

PerlMonks had a new question about the recommendation to use exception objects instead of die "STRING". I posted a link to The Stringceptional Difficulty of Changing Error Messages and started a heated debate over the value of using regular expressions in exception handling.

Consider this.

You have a larger application (because smaller applications rarely need this much discipline, and you're a grown up and can decide for yourself how much safety and reliability your program needs). You have identified several types of exceptional conditions and you want to handle them reliably. Every exceptional condition can possibly include diagnostic information, such as a customer ID or the module name, function/method name, and even line number. You might include the version number of your code and any specific information about the specific host.

In other words, your exceptions have gone beyond "Something happened! Exit the program with a message!" to "Something happened, and here are the details you need to know to diagnose it." The exception may be a system error (Disk full! Network connection gone! Database full of cheese!) or a data error (No authorization for this operation! User record corrupt! Autotune detected!). The exception may be resumable (Lock not acquired!) or fatal (Fox in acquisition talks!).

You'd like to distinguish between those.

If you use exception objects, you can use the class of the exception to distinguish between resumable and fatal exceptions:

my $e;

if ($e = Exception::Class->caught() && $e->does( 'Exception::Resumable' )
{
    ...
}
else
{
    $e->rethrow();
}

... and within the body of the handler you can rely on the presence of attributes in your exception objects:

my $e;

if ($e = Exception::Class->caught() && $e->does( 'Exception::Resumable' )
{
    my $line     = $e->line();
    my $module   = $e->module();
    my $severity = $e->severity();
}

You can go as far as to make these attributes mandatory when creating exception objects, so if your test suite exercises the code paths which create exceptions (as it should), you can rely on correct use of exception data. That is to say, when you catch an exception, you know the interesting data is present.

You can also search your codebase for instances of the strings corresponding to the names of exception classes in both throwing and catching code.

Now consider how to do this with die "STRING". If you're fortunate, you might get line number and filename. Good luck searching for the right text to find exception handlers. Good luck extracting structured information from strings. Good luck verifying the proper formatting of strings in your test suite.

Again, you don't always need this level of control when throwing or catching exceptions—but to decry it as "Java-like" and "unnecessary" and to ignore its advantages in some cases over regular expression matches against text is short sighted.

Update: Robert Sedlacek explains the point succinctly:

Improving your error messages should never be able to break your error handling.

Modern Perl: The Book: The (draft) PDF

By chromatic on October 1, 2010 1:13 PM | 1 Comment

Update: Modern Perl: the book is out! Skip the draft and download the real thing.

I've finished writing and editing Modern Perl: The Book, and it's gone into production, which means that Onyx Neon is preparing a print-ready PDF to give to the printers. The book should be available in print by the end of October, if not sooner.

I've just uploaded Modern Perl: The Almost-Ready-for-the-Printer PDF for your perusal. We have yet to do line and page breaking, and we'll probably fix a few typos and conversion artifacts, but I figured that so many people have contributed (oh, and I need to add a CREDITS page for everyone who's helped!) that a few more might welcome the chance to see how the book will look in print.

Please do not redistribute this PDF, as it'll keep changing as we find and fix more little problems. Feel free to pass on a link to this page. We'll make a very nice PDF after the book goes to print, and we'll have an epub version as well. I'll put those under a friendly license once the publisher gets a few things set up.

Feel free to contact me directly with comments, questions, or concerns. As usual, the best place to report a problem with the content of the book is the Modern Perl book Github repository, but you can also email me directly (chromatic@cpan.org).

Thanks again for all of your help.

« September 2010 | Main Index | Archives | November 2010 »

October 2010 Archives

How not to Handle Exceptions

Reinventing the Axle

Closures, Late Binding, and Abstractions

When You Know Most about What You Need Most

Due CREDITs

Certification or Delivery

Version Dependencies: Don't Guess!

Structured Data and Knowing versus Guessing

Modern Perl: The Book: The (draft) PDF

Modern Perl: The Book

Categories

Monthly Archives

Pages

About this Archive