June 2010 Archives

Modern Perl: The Book: The Draft

By chromatic on June 28, 2010 3:55 PM | 21 Comments

Update: Modern Perl: the book is out! Skip the draft and download the real thing.

Last week at YAPC I finished editing the draft of Modern Perl: The Book.

I'm pleased with how it's turned out, but I'm not yet ready to hit the "Ship it to stores!" button. In particular, I could use your help:

Is the material accurate?
Is the material effective in its explanations?
Are any parts confusing?
Is the material comprehensive?
Are there typos or infelicities?
Are the cross references appropriate?
Is the organization and order sensible?

You can always render the latest version of the book from a Github checkout by running perl build/tools/build_chapters.pl and perl build/tools/build_html.pl. You must install Pod::PseudoPod to build the HTML. The easiest way to install Pod::PseudoPod is to install cpanminus, either with your packaging system or with commands like:

    cd ~/bin
    wget http://xrl.us/cpanm
    chmod +x cpanm

Then run cpanm Pod::PseudoPod.

I've also uploaded the file to the drafts section of this website.

I welcome any and all feedback. You're more than welcome to contact me directly, but the easiest mechanism for me to receive feedback is in the form of bug reports, patches, and pull requests on Github. Please feel free to add your contact information to the CREDITS file if you prepare a patch, even for the simplest typo. Gratitude is important and I have much to share.

We plan to send the book to the printer sometime in August, and we'll certainly make PDFs and ePub versions available as well. For now, please don't redistribute the book; please instead distribute links to the book.

The draft chapters are:

Preface

The Perl Philosophy
Perl and Its Community
The Perl Language
Operators
Functions
Regular Expressions
Objects
Stylish Perl
Writing Real Programs
Perl Beyond Syntax
What to Avoid
What's Missing

The Virtuous Dilemma of Iterative Improvements

By chromatic on June 23, 2010 7:02 AM | 4 Comments

Overheard at YAPC last night:

Person one: I can't find Try::Tiny exciting.

Person two: Isn't promoting that admitting that Perl 5's exception handling is broken?

(Perl 5.13.1 introduced improvements to Perl 5 exception handling. They need more testing on existing code, but the Perl 5.14 release next year will likely include them. The existence of Try::Tiny has directly produced language improvements; in this case the cycle works.)

I like Try::Tiny; it simplifies exception handling in Perl 5. Handling exceptions correctly (even in 5.12) is difficult. That's primarily a syntax issue. You can handle exceptions correctly—Perl 5 provides all of the tools necessary to do so—but it's not simple to read or to write.

Similarly, runtime type reflection is difficult: ref is unreliable and the standard hacks for abusing UNIVERSAL::isa are far too often wrong.

Try::Tiny is nice because it offers good syntax which hides all of the internal complexity Perl 5 exposes. A world with T::T is better than a world without it.

Even so, Try::Tiny is only useful to people who know that it exists, who can install it, and who make use of it pervasively. Try::Tiny isn't useful for people who don't know it exists. In other words, Try::Tiny isn't useful for novices. It could be and it should be, but novices aren't going to hear about it. (Some will, but how many novice Perl developers know about the CPAN? How many novices have joined the Perl community sufficiently that they can learn from the hard-won wisdom of Perl adepts? Vanishingly few, as they do not remain novices for long.)

Here's the virtuous dilemma: isn't it nice to have a language with a working and well maintained extension mechanism? Isn't it nice to be able to modify our own languages in our own lexical scopes? Yet when our language modifications improve basic language features—instead of adding new language features—should we praise them as victories instead of acknowledging them for the patches and bugfixes that they are?

Yes, changing how exception handling works in Perl 5 now, especially if it requires semantic changes, may cause some pain. (It may also fix undiscovered bugs in existing programs, which is a virtue of its own.) Yet isn't the ability to modify the language for your own purposes sometimes akin to running a patched version of the language itself?

Yes, it's safer and more composable, and less permanent than running a patched binary, but still....

When Assembly Leaks Through

By chromatic on June 21, 2010 10:11 AM | 1 Comment

Paraphrasing Piers Cawley talking about why to give up Ruby for Perl (paraphrased):

This is assembly language for calling a function:

    push address of RETURN on stack
    push $argument on stack
    get address of FUNC
    goto FUNC
RETURN:
    pop return value off of stack

This is Perl 5 for calling a function:

func($argument);

That's all well and good. This is assembly language for entering a function:

    pop parameter from stack
    verify parameter type constraints
    ...

This is Perl 5 for entering a function:

sub func
{
    my $parameter = shift;
    die 'Wrong type' unless $parameter->isa( 'Whatever' );
    ...
}

... but wouldn't it be nice if:

sub func(Whatever $parameter) { ... }

(No new insight here, but directly prior to this talk, Damian Conway and I talked about how Perl 6 signatures simplify much, much more code than this simple example illustrates. Lots of code goes away.

Assumptions

By chromatic on June 17, 2010 1:55 PM | 2 Comments

Memory is expensive, and every character we could possibly ever want to use fits in eight bits, so of course strings are sequences of eight-bit characters.

A tree is the most obvious representation of a programming language within a compiler or interpreter, so of course a programming language which should allow manipulation of the language by mere users should use the textual form of its tree representation.

Programmers can make mistakes with complex language features so of course removing those features will prevent mistakes.

Sometimes writing fast programs requires low-level access to memory so of course you can write this new language just like you wrote its predecessor.

Portable programs can be useful so of course it's important that you never leave the comfortable confines of the virtual machine and standard library or image.

Beating Microsoft is essential to our success so of course we don't have time to think through the implications of our language's design.

Bad programs often have poor indentation so of course enforced indentation will make people write good programs.

Perl 1, 2, 3, and 4 were great replacements for awk, sed, and shell programs, so of course the everything-goes, loosey-goosey approach to strictness is the right default for Perl 5.

From this I conclude that language designers are as good at predicting the future as anyone else: not very.

Sometimes a Little Too Dynamic

By chromatic on June 14, 2010 4:15 PM | 4 Comments

I've written before about difficulties in parsing introduced by indirect notation in Perl 5 as well as why barewords in Perl 5 making parsing difficult.

Nicholas Clark reminded me over the weekend of something I heard long ago. In particular:

... as well as action at a distance, there's a speed hit on every class method call because first the code does a stash lookup to see if the package name string is actually a filehandle ...

— Nicholas Clark, methods and bareword file handles, action at a distance, (un)speed

That is, when invoking a method on a bareword (such as the name of a class), Perl 5 has to check whether a filehandle of that name exists, then fetch the underlying filehandle object, then invoke the method on that object. This takes place even if you never use bareword filehandles in your program. It's likely too risky to change now (at least in Perl 5) lest millions of programs written since 1994 suddenly and mysteriously break with odd error messages.

(The correct design solution is to forbid the use of ambiguous barewords at the syntax level in order to avoid costly and often unnecessary runtime checks. In other words, similar things should look similar and different things—filehandles and class names—should look different.)

The Importance of a Number

By chromatic on June 11, 2010 3:34 PM | 9 Comments

Today I'm editing the Modern Perl book and the Rakudo Perl 6 book. The human brain is wonderful at seeing patterns, even unintentional and nonexistent. I saw the word "Perl" in both book titles and wondered at the confusion.

Why is Modern Perl clearly about Perl 5? Why does Rakudo Perl 6 need a version number? As Patrick wrote a couple of weeks ago:

Perl 5 doesn't get to claim the entire mantle of the name "Perl" for itself forever.

Granted, Modern Perl is my book, published by my company, and I have the final say on anything on or in the book—but can anyone think of a good reason not to rename it to Modern Perl 5?

(Note that Andy Lester suggested saying "Perl 5" when you mean "Perl 5" almost a year ago, and he was right.)

An Optometrist and a Language Designer Walk Into a Bar....

By chromatic on June 9, 2010 2:09 PM | 8 Comments

(or, stop me if you've heard this one before)

A good friend of mine first wore glasses in college. He hadn't realized that the world had been blurry for 20 years until his doctor demonstrated exactly how clear the world can be.

If you visit the optometrist and get fitted for corrective lenses, the doctor will put a viewfinder over your face and fiddle with dials, asking "Better or worse". Your task is to squint and squirm and try to read the optical equivalent of lorem ipsum until everything becomes clear and you no longer have to worry about your focal depth. If you get it wrong, you'll have slight headaches for a year until you correct things again. If you get it right—well, that's a matter of preference.

After all, no one can decide your own tolerance for clarity.

Some people believe that Perl has too many operators and they ask questions like why are there separate comparison operators for strings and numbers, for example.

Better or worse?

$x eq $y
$x.to_str == $y.to_str
(String)$x == (String)$y

Granted, the underlying mechanism in a virtual machine might perform the same operations:

get_string %SREG(0), $x
get_string %SREG(1), $y
compare %SREG(0), %SREG(1)

... but people like me write lots of code so you don't have to write code like that, or even know how that code works.

Suppose the optometrist knows all about Hindley-Milner, and over a second round of delicious beverages, he asks the programming language designer about typeclasses and inference:

$x == $y

Better or worse than the previous examples?

Of course, your optometrist probably isn't even an amateur programming language designer, so it's easy to counter the argument:

Does this operator compare container equivalence or value equivalence?
If value equivalence, must the values be the same or equivalent?
If container equivalence, how deep is the container equivalence?
Must the inferred types be the same, or can they be contra/covariant?
If deep container equivalence, how does this comparison interact with laziness or infinite data structures?
If value equivalence, are coercions allowable between potentially comparable representations?
What happens if someone overloads this comparison operator for one type or another?

Better or worse?

$x == $y
$x.CONTAINER == $y.CONTAINER
$x.VALUE == $y.VALUE
typeof $x == typeof $y
<contravariant_type>$x == <contravariant_type>$y
LAZY($x == $y)

... and so on.

Over a third round of drinks, the programming language designer might get the optometrist to admit that the reason some languages have so many operators is because there are so many possible operations. The question, as always, is "What's the most clear to read and to understand?"

Punctilious and Parsimonious Primitives

By chromatic on June 7, 2010 3:18 PM | 1 Comment

Some programming languages have operators. I borrowed the description "an operator-oriented language" for Perl a long time ago; the origin of that phrase probably goes through Larry. Other languages focus almost exclusively on nouns. Some languages derive practicality in programming from a small set of carefully chosen axioms.

One recurring debate in programming language design is whether the mathemeticians should have any say. (I phrase this deliberately.) Human factors such as convenience, discoverability, learnability, and efficiency all govern the ultimate design of any programming language intended for people to use. Yes, The Little Schemer implements multiplication in terms of addition and recursion in order to teach symbolic reasoning and recursion, but a practical Scheme compiler will ultimately take advantage of the fact that multiplication at the transistor level doesn't use recursion.

... but all of that gets into silly technical debates where people throw around insulting phrases such as "Turing Tarpits" and "Periodic tables of operators" and "Using every symbol on the keyboard" and that's how some people miss the really interesting point.

The interesting point is that the representation of a program in source code is, at every level, a data representation—and compression—problem.

Consider spoken language. I have a delicious baked pastry, cooked in a round tin, with strawberries and rhubarb and a bit of sugar and gelatin in the middle. It tastes delicious with a frozen dessert made of vanilla beans, cream, some sugar, and ice. I enjoy eating this combination very much. In other words I like pie.

That's easy to read and that's easy to understand and it's easy to type, but it turns out I don't like mincemeat pie and I don't care much for shepherd's pie and I'm ambivalent about chocolate mousse pie but think that key lime pie is also as delicious. You get the picture, though.

Granted, you can specify the type of a particular pie, or you can express a generic "Any pie will do" with the word "pie" (or perhaps you take advantage of allomorphism) and you get a good understanding of my interests in the abstract. Sometimes you need to be more specific, as when I'm baking in the kitchen or when I'm at a bakery grunting and pointing at something that looks delicious surrounded by two pies that have raisins. Yuck.

Even so, the three word phrase "I like pie" communicates volumes of information... except that I can speak it sarcastically, or as a question, or with emphasis, and the tonality of spoken communication adds additional information that a series of twenty six letters and the space character can't convey. I like pie (but she does not). I like pie (contrary to what you say). I like pie (but that's technically a torte, you Philistine)! I like pie? (Find a good way to mark a rising intonation at the end of a sentence and you'll convey irony online.) Perhaps I'm admitting an addiction to the dessert arts (I... like pie).

You gain some advantages in expressivity by adding additional ways to convey information. Even though I can explain the difference between a torte and a slice of pie with a couple of sentences, it's still easier to say "I like pie" and give the little cake a dubious expression, as if waiting for it to change form or you to assure me that it's equally delicious.

Even though you can reduce the set of primitives of English language further (you probably need to keep a few punctuation symbols, but if you require everything use declarative sentences or simple questions, you can get rid of exclamation points, apostrophes, colons, semicolons, and dashes) by using an encoding such as Morse code, you don't gain in expressivity. Try explaining my feelings about muffins versus cupcakes in telegraph style.

Then again, the mathemeticians have a point. Is a pictoral language without a semi-phonetic writing system necessarily more expressive than English? By no means. That's because another axis of information compression and representation in natural language governs the paucity or abundance of available words. You see this when one language borrows words from another, or where a new word (especially a verb) enters the popular lexicon.

It's easy enough to describe how you performed a search online for details about a good bakery, but it's shorter to say that you Googled it. The latter might even be more expressive; you can take advantage of synecdoche and metaphor to gloss over specific details about how a search engine works and the use of a web browser. Sometimes a convenient fiction improves understanding.

This is not to say that such language techniques as metaphor and irony and sarcasm are always immediately apparent in written language. (The lack of tonality and body language present difficulties, but it's *possible* to convey meaning effectively with a little cleverness. You can argue that adding in features such as the comma or the dash—in this context they're infix operators used, in part, to separate independent clauses—gives you possibilities that you don't have without them. You must write with some degree of awkward precision and arrange your sentences and points with as much directness and left-to-right straight through readability as possible to convey the same meaning that a tiny bit of punctuation would render with much less ceremony.

Natural language does have its ambiguities, and it's not clear that adding features to a language (whether words, tonality, punctuation, or idioms) always clarifies by providing more opportunities for abstraction and encapsulation of thought. Certainly jargons exist for the stated purpose of clarity but the effective purpose of obfuscation to discourage non-professional practitioners (law is particularly bad about this, but so is programming and even so math, which has its own history of adding often conflicting notations).

Yet the question isn't "Is adding primitives bad?" but "What's the right balance between expressivity and the possibility of extendible expressivity?" You see this in discussions about the value of operator overloading (and synonyms and homophones). It's not even a matter of mathematical purity (tell a math programmer that he needs to use the method form for performing the cross product of two matrices he's defined with a nice declarative syntax). It's a matter of human factors and usability and convenience for people to communicate with each other.

Yes, humans destroy rigorous formalisms by being so unpredictable. Yes, people can make messes. Yes, people can be confusing. Yet somehow we manage to communicate effectively. Why not explore some of those principles when designing mechanisms of communication?

The False Uniformity of Oatmeal Code

By chromatic on June 3, 2010 9:26 AM | 3 Comments

How can you tell the difference between good code and bad code?

Hold that thought. Can you tell the difference between David Foster Wallace and Ernest Hemingway? Here's something in the DFW style:

And but so, walking beneath the faux-Greek[1] architecture crumbling and rotting away in the murk of the autumn, an autumn which had not so much descended as creeped up upon the city from the bay, rolling up from beneath flat-bottomed ships and barges where dockworkers laughed and swore and jabbed at each other, questioning parenthood and manliness, there a soiled plastic bag crinkled and cried beneath my feet, squawking its millennia-long half life, and there the acrid sky scowled down on me like an angry god.

(The footnote itself has a footnote.)

The Hemingway style is also distinctive:

See the man. The man is old. See the sea. The sea has fish. Fish, old man, fish. Die, old man, die. War is hell.

Programming languages can be as expressive and idiomatic as natural languages, and they govern how you write in them. (So do coding guidelines and dialects and standards and idioms: if you're Theodor Geisel or Ernest Hemingway, you get sentences of four to seven words, no complex clauses, and, by gum, you don't get footnotes. If you're DFW or another pomo hero, you cram and glue separate sentences together with whatever punctuation is at hand, and if you tell a coherent story, cut it up and paste it together and pretend you didn't steal the technique from William S. Burroughs.)

One important difference between source code and literature is that aesthetic qualities are secondary concerns for source code.

Even still, you can't discount it. Obviously bad code can have warning markers, and that is itself a useful characteristic: you can tell it's bad code just by looking at it.

(The word bad here is ambiguous. Is it bad because it has subtle bugs of implementation? Is the algorithm wrong? Does it have obvious or non-obvious failure conditions? Does it meet the specification? Is it maintainable? Does it have security problems?)

Here's the interesting thesis: languages which allow you to write ugly code let you skim programs to find bad code. Languages which force you to write uniform code take away your ability to skim programs to find bad code.

In other words, the superficial visual differences between good Lisp code and bad Lisp code or between good Python code and bad Python code or good assembly code and bad assembly code or good Java code and bad Java code (or good Befunge and bad Befunge code, if you haven't had enough DFW yet) are smaller than the superficial visual differences between good Perl code and bad Perl code or good C code and bad C code or good C++ code and bad C++ code or good PHP code and bad PHP code.

(Statistics lie slightly; the difference is likely one of standard deviations rather than absolute values.)

This characteristic may hide a small irony: Hemingway may be easier to skim than David Foster Wallace, but skimming Hemingway may lead you to skip over important small subtleties which a detailed reading can reveal. (It's an imperfect metaphor: that would require Hemingway to have had the ability to include subtlety in his writing.)

Put another way, if the natural tendency for writing code in a language produces code uniform within a small standard deviation, it's difficult to write code that's obviously right and code that's obviously wrong — and isn't that a characteristic of good APIs? The right code should be obviously right and incorrect code should give you the howling fantods because it's so obviously wrong.

The Fundamental Unit of Encapsulation

By chromatic on June 1, 2010 4:33 PM

Eric Wilhelm: the fundamental unit of reuse is the loop.

Me: the fundamental unit of encapsulation is the lexical scope.

Perl 5 had a lot of goals. It was most effective in encouraging the development of custom extensions (see also CPAN, The). It has been less effective in its greater goal: encouraging the development of local language modifications.

You can trace the primary goal of Perl 6 (and the primary goal of Perl 6 isn't merely cleaning up inconsistencies of Perl 1 - 5 and Unix) to this design goal of Perl 5. Perl 6 adds a lot of great features, like multiple dispatch and function signatures and a powerful object system built on roles and hyperoperators. Perl 6's biggest and most important feature is grammars and rules.

If you look at Perl 5 a particular way, pragmas such as strict make a sublanguage (or a pidgin), in this case by restricting the possibilities of the code you could write to a smaller set of valid code. Of course, you can make the same argument for anything which exports symbols into another namespace, but then you annoy the people who think that writing APIs makes them elite DSL creators.

The most interesting feature of the strict pragma is that its effect is lexical. A lexical scope encapsulates language modifications from strict. That pidgin does not escape into outer scopes.

Extrapolate to Perl 6.

(If you've worked with macros in C or C++ or source filters in Perl 5, note that they have file scope, unless you have a flash of brilliance or obsessive attention to detail to restrict their scope. That's a pale shadow of Perl 6 grammars, and that's one reason Perl 6 implementation takes a while: you have to invent a grammar engine which allows arbitrary customization of tokens and precedence and productions in a composable and lexical fashion.)

... and it all builds on the notion of lexical scoping.

« May 2010 | Main Index | Archives | July 2010 »

June 2010 Archives

Modern Perl: The Book: The Draft

The Virtuous Dilemma of Iterative Improvements

When Assembly Leaks Through

Assumptions

Sometimes a Little Too Dynamic

The Importance of a Number

An Optometrist and a Language Designer Walk Into a Bar....

Punctilious and Parsimonious Primitives

The False Uniformity of Oatmeal Code

The Fundamental Unit of Encapsulation

Modern Perl: The Book

Categories

Monthly Archives

Pages

About this Archive