February 2009 Archives

Due Credit to the Unsung

Warning: introspection and metadiscussion follows for several paragraphs.

Piers Cawley is right about Modern Perl, this is a manifesto. Like all good manifestoes, it ignores some of the details of reality in order to bring about a more pleasing reality.

I've written voluminously here and elsewhere about many technical problems with Perl 5 and its culture. I hope I've not given the impression that I believe these problems are insurmountable, nor that the community and the language and the ecosystem are irredeemable. I believe the opposite. I write and argue and debate and think and exhort because I believe that we can address these problems and invent a better future.

It's difficult to throw a revolution when no one shows up, however. Hence the manifesto nature of these writings. I've deliberately chosen black-and-white imagery and simple, surgical discussions of various ideas to present my arguments with sharp precision. Sometimes it may sound like I'm a knight on a white horse tilting against a big windmill labeled "The Perl 5 Porters". That's not precisely true. Not only is p5p an amorphous group of hundreds of individuals with thousands of opinions, but all I really want to fight is inertia.

With that said, there are several people actively working on improving problems I've identified. I won't take credit for inspiring them; I can't. They work often behind the scenes, doing wonderful things that help hundreds of thousands of people directly and hundreds of millions of people indirectly. Their work is hugely important. Though I can't name every one of them here right now, they all deserve credit for helping Perl and its ecosystem to continue to evolve.

  • Nicholas Clark has released more stable versions of Perl in the past decade than anyone else. He was the initial guinea pig for the experiment of releasing a new maintenance release every three months. His practical experiences will help us return to a regular release cycle.

    As well, one of his current projects is revising portions of the Perl core (especially its library loading paths in @INC) to promote freshly installed modules over core versions. This is an important step toward moving more modules out of the core.

  • Andreas König, the creator and maintainer of the CPAN module has wonderful tools for testing any CPAN distribution against any revision of bleadperl. This has found and fixed many bugs in both the core and CPAN modules.
  • Rafaël Garcia-Suarez is a core committer and was the pumpking for Perl 5.10. He approved several important changes such as deprecating pseudo-static lexical variables, making the topic variable lexicalizable, and adding the name of an undefined variable to the undefined variable concatenation warning. These deprecations and backwards-incompatible changes make modern Perl easier to write and to maintain.
  • Dave Mitchell is a core committer and pumpking for Perl 5.10.1. He's fixed countless bugs -- countless difficult and thankless bugs -- and is the man to convince to continue to improve Perl.
  • Yves Orton refactored the regular expression engine for Perl 5.10, making it more correct, often faster, and much more featureful. It's impossible to overestimate how much work this was.
  • Paul Fenwick marshalled a group of volunteers to maintain a list of changes between Perl 5.8 and Perl 5.10. He's also written a replacement for the difficult-to-use Fatal core module. Its replacement is autodie.
  • Michael Schwern maintains the core module ExtUtils::MakeMaker. No one wants to see this code die more than he does. (Not even I want to see it die that much.) It's thankless, difficult work -- but Schwern has kept up with necessary changes so that its eventual replacement can take over silently and effortlessly. He also recently effectively removed any reason to use Perl 5.005, by declining to support it in newer versions of MakeMaker.

I can only mention several other contributors, such as Renée Bäcker, David Landgren, Sam Vilain (who helped migrate Perl 5 to git, making releases and collaboration much easier), Leon Brocard (another migration worker and pumpking in his own right), David Cantrell, Steffen Mueller, Craig Berry, and John Malmberg.

I could list dozens more.

Perl 5 is not perfect. Its development process still has its flaws. We can improve it many times over. Yet credit must go to all contributors who have brought it this far -- and who continue to help it grow and thrive. Thank you all.

Pick One (and also, you should win stuff)

Project management is the craft of applying limited resources of time, money, and knowledge to produce a desired result.

We talk too infrequently about limited resources in the free software world. The theories are nice: there's a near-infinite army of programmers each willing to run one test, or add one feature, or write one line of documentation, or fix one bug. It's a lovely theory. It's not entirely wrong, either. When someone new posts a patch or a bug report or even says "Hey, I used your software and it saved me time," my motivation level improves. I've made a difference in the world, and the collaborative, open development strategy I've chosen lets other people help make a difference in the world.

That doesn't mean every project always has a wealth of resources. I believe that we have to manage our existing resources wisely.

For Users

Do you want a project that you can install and use and never manage and never upgrade? Do you want to configure it once, and leave it alone, and forget it's there?

Do you want a project that fixes bugs and adds new features as it comes to understand the problem domain better? Do you want your life to get easier, as polish removes some of the rough spots and the software does more of the heavy lifting for you?

Pick one. You can't have both.

For Developers

Do you want your source code to get easier to work with over time? Do you want your bug count to trend toward zero? Do you want predictibility in your release schedules, an active user community helping you iterate toward the ideal software for the problem domain, and the freedom to evolve?

Do you want your users to never have to worry about changes? Do you want them to be able to step away from the project for several years and suddenly on a wild whim install a new version and have nothing changed, nothing to worry about? Do you want to be so predictable that the best way to use your software in 1994 is still the best way to use it in 2009?

Pick one. You can't have both.

You Can't Have Both

Nothing says it better than The Itchy & Scratchy & Poochie Show:

     Man: How many of you kids would like Itchy & Scratchy to deal with
          real-life problems, like the ones you face every day?
    Kids: [clamoring] Oh, yeah!  I would!  Great idea!  Yeah, that's it!
     Man: And who would like to see them do just the opposite -- getting
          into far-out situations involving robots and magic powers?
    Kids: [clamoring] Me!  Yeah!  Oh, cool!  Yeah, that's what I want!
     Man: So, you want a realistic, down-to-earth show... that's
          completely off-the-wall and swarming with magic robots?
    Kids: [all agreeing, quieter this time] That's right.  Oh yeah,
          good.
Milhouse: And also, you should win things by watching.

Like most false dilemmas written on the Internet as rhetorical devices, the truth lies somewhere in the middle — but for goodness' sake, is it possible to recognize that software under active development cannot guarantee perfect stability, and that if your goal is perfect stability, there's no point in continuing to maintain software?

Spending limited resources to ensure that change never happens is a great way to ensure that improvement will never happen — and that you won't have the problem of limited resources in the near future.

The Opposite of Modern

| 2 Comments

Sometimes I have the odd dream that someday there will be programming language descriptivists in the same way that there are descriptive linguists -- but more so. They'll be the Unitarian Universalists of software development. Anything goes. It's all okay. Just get your job done and be happy.

That's not entirely wrong. The term "best practices" has a severe problem in that it implies that for any situation you can find a single best practice. The best practice often depends highly on your situation, and you often lack sufficient information to choose between several equally good options.

This is why we have so many programming languages and toolkits and libraries and design possibilities. This is why we refactor, and we iterate, and we try to learn from our experiences to continue building better software for our own purposes according to strong visions. We don't have to get it perfect, and there are many ways to accomplish the same thing. We just want to get it a little bit righter every time we make a change.

It's a false dilemma to say that all choices are equally good, however. If there's a secondary theme to all of my writing here, it's that some choices are bad.

Let me give you an example of what I want to call "the polar opposite of modern Perl". It's bad Perl. You can find plenty of examples of it on the Internet, but so many of them end up in search engines and then in copied and pasted code that well meaning novices copy and paste into their own programs, because it's just too hard to tell who's authoritative and who's right, and ... well, see for yourself: 2008 Winter Scripting Games Solution to Advanced Perl Event 1: Could I Get Your Phone Number?.

This code won't even compile if you use the Modern::Perl module -- and that's a feature of the module.

I particularly like some of the explanations of the code. It goes wrong on the first line:

Our script for Event 1 kicks off by creating a hash table (similar to a VBScript Dictionary object) named %dictionary; that's what this line of code is for:

%dictionary = ();

In reality, that line of code does nothing. It doesn't declare a variable. It doesn't assign anything to the variable. It's effectively a no-op. It's useless, vestigial code that exists because... well, almost certainly it was copied and pasted from another program somewhere. It'll get copied and pasted further now, despite doing absolutely nothing, and more novice Perl programmers will get the wrong idea that this is how you work with variables in Perl because someone said so authoritatively -- never mind that the explanation is completely wrong.

(There's a naming quibble too. Almost no one calls hashes in Perl dictionaries. We have a perfectly good name for them instead. It's "hash". Of course, with the hash sigil and the keyed lookup syntax used to refer to keys and values of the hash, it's obvious it's a hash, so this is like calling a variable "variable" in another language. As the rest of the code uses this variable as a mapping from numbers on a telephone pad to letters, a better name would be %nums_to_letters, which at least gives some semblance of the purpose of the variable, not its internal implementation.)

I can't afford to comment on every line of code here, but a few more groaners popped out at me even as I skimmed the code:

Once we have the hash table set up we use this chunk of code to open the text file C:\Scripts\WordList.txt, read the contents into an array named @arrWordList, and then close the file:

open (WordList, "C:\\Scripts\\WordList.txt");
@arrWordList = <WordList>;
close (WordList);

This is another example of code that won't compile when used with Modern::Perl. (It's difficult to blame the author of this code for that, however. Perl 5 arguably has the wrong default behavior... but the lack of error checking is disturbing. This is nothing compared to the sentence after the explanation of what this code does:

As you can see, it's easy enough to read the contents of a text file and store those contents in an array; however, as far as we know it's nowhere near that easy to search an array for a specified value (which we'll have to do to determine whether we've created a real word out of our phone number).

After demonstrating the use of a hash to look up values by key... yeah, I can't finish this sentence either. The example code instead joins all of the lines in the file -- one per word -- into a string and uses a regular expression to search the entire string for a word made from a phone number. This is not an esoteric problem. It's a frequently asked question about Perl, present in perlfaq4, the core Perl documentation, installed with every installation of Perl: How can I tell whether a certain element is contained in a list or array?

My gripe isn't that the regular expression search is inefficient (it is, but not as bad as it could be), nor that the code is clunky (it is), but this is a bad algorithm in any language. A better example is trivial to write in Perl, even for someone who's not an expert. Jan Dubois is a Perl expert, and the expert Perl solution to 2008 Winter Scripting Games Advanced Perl Event 1 is not only shorter and more correct, it also includes error checking, compile-time checking, and other robustness features. It's a little bit more complex and definitely more idiomatic, but it's correct and the explanation is also correct.

My gripe is this: someone deliberately published the novice version in 2008. It's bad code. It'd be bad code in any language. It's particularly bad Perl. Worst yet, it's bad code explained incorrectly.

That doesn't help anyone.

In My Own Way, I Am Core

Is the DBI any less essential because Tim Bunce maintains it outside of the Perl core? It's the official database interface for Perl 5. It's well-maintained. It's comprehensive. It's stable. It has copious documentation, including at least one book. It's code no one's crazy enough to compete with.

It's not a core module, and no one cares that it's not a core module. It's still the way to access databases from Perl, regardless of any official or blessed or core status. It's just great code -- stable code -- that's been around for a while and does what it does very, very well.

It's still not a core module. If you use it, you've installed it yourself, your operating system vendor has included it, or the Perl distribution you've installed has included it.

Good. Bad. I'm the Guy with the Core.

A persistent difficulty of language designers is that we can't predict what users will want to do. We want to give them power and flexibility, but we also want to make languages that will fit in their heads and get out of the way when they don't need all of the features we've invented. The balancing act between "small is beautiful" and "complexity has to go somewhere" is difficult. So is finding the right balance between "Let's make it easy for the users!" and "Someone has to implement this beast!"

One rule I've tried to express here recently is "Someone has to maintain this beast!" For "beast", read "the core language", "the core libraries", or "the dependencies of my application." Here trouble begins.

If nothing ever changed in your application besides the changes you had to make, life would be peachy keen. Bacon would fall from the sky, and you'd get thinner and your cholesterol would improve when you ate it. You would not need to worry about operating system upgrades, or security patches, or new versions of dependencies, or what users had installed versus what you have installed. You would work a very comfortable 35 hour week and go home and sleep the sleep of the just.

You'd ride a magic pony to and from work, too. They eat bacon, but their breath is sweet. The world doesn't work that way.

Change is not the enemy though. Poorly managed change is the enemy.

The Book Awoke Something Dark in the Core. Something Evil.

Larry Wall and the rest of the Perl 6 design team like to talk about the Waterbed Theory of Complexity. You can take the Lisp or Forth approach of defining a very simple, easily understood core and building everything else in terms of that. "It's simple!" people say, until they start building real applications in that. One of the deepest lessons of computer science in The Little Schemer is that you can implement multiplication recursively with guard clauses and subtraction. One of the best lessons of software development is don't do that. Scheme is simple. Lisp is simple. Forth is simple. The SK calculus is simple. All of these are simple in theory, but doing practical programs may not be simple. There is an essential complexity in certain tasks that you cannot sweep under a rug. It will come out messy.

Put another way, Larry says that the Perl language can be messy because the problems it tries to solve are messy.

I suspect there's a similar theory of change. You can try to hide change, but mostly you just move it around. That's certainly true in the Perl core.

I'm Afraid I'm Gonna Have to Ask You to Leave the Core

The discussion I started with Sacrificing the Future on the Past's Golden Altar spread to p5p the other day. The question of replacing or enhancing File::Find came up, and someone raised the idea of adding an alternative to File::Find to the core.

I replied that that would make the problem worse, not better.

If I had my way, the Perl 5 core would contain the language itself and only those libraries absolutely necessary to download and install other libraries from the CPAN. That's it. There is no more. There's no Switch, or Class::Struct, or Search::Dict, or File::DosGlob, or File::Find in that core.

There'd be no ExtUtils::MakeMaker either, but that's aesthetics, not function or flourish.

I'll Swallow Your Core!

Clearly the core zeitgeist has gone in another direction. You don't swallow the core. It swallows you. There's no better explanation than Nicholas Clark's description of why people want modules to enter the core:

  1. The Perl core is already installed. But they can't get approval to install other modules from CPAN.

    [Bad programmer. You're trying to burden someone else with a long term technical problem because you've failed to address your local political problems]

  2. They perceive modules in core as being "blessed" - if it's there it must be better than all the competitors on CPAN]

    [Bad programmer. Historically things have only ever been added to the core. Reasons for its addition at the time may not be as clear cut as you infer, and there may now be a better solution. You're trying to burden someone else with a long term technical problem because you're falsely lazy, excessively inpatient and insufficiently hubristic to devise your own criteria for selecting the right module for the job]

  3. They perceive modules in core as being "supported" - if it's there, it will be looked after for ever.

    [Bad programmer. You appear to think that the mere mortals volunteering to maintain the core are of a difference species than the mere mortals volunteering their code to CPAN]

In other words, "please maintain my dependencies for me!"

I agree that change can be painful, and that arbitrary changes are unpleasant and not useful. No one's arguing for those (at least no one who isn't a violent time-traveling sociopath with a chainsaw for a hand).

However.

We Are But Sixty Men

This is not your rug, and you are not welcome to sweep your change management problems under it.

The core developers can provide you with a great language -- a modern language -- designed to help you solve problems that no one can forsee now. They can provide you with mechanisms to extend the language in reusable and shareable ways. They can encourage experimentation with new features and ideas with the intent of pulling those ideas into the core in future versions if they prove useful. They can even maintain well-defined backwards compatibility and keep promises not to break thing arbitrarily.

They cannot do so and take on an ever-increasing maintenence load. Coordinating the release cycles of the core language and dual-lived modules and version numbers and multiple authors with their own time schedules and multiple queues and venues for bug reports and feature requests and patches... well, that's madness.

The DarkPAN is and shall remain impenetrable. That's fine -- but that means the DarkPAN has to take responsibility for its own code. It's just not possible for the core to do so any longer.

Hanging the Core Out to DRY

The CPAN has improved Perl in many ways, but it's exposed other problems.

If you know how to use the CPAN, you can install and upgrade modules and distributions. This is great for users. You get new features and new capabilities, and sometimes you can even change how Perl works internally, without having to upgrade Perl itself. That's great for the core developers, because people can experiment with new features and ideas and even syntaxes and dialects without changing the core.

Everything is great... except for the unforeseen repercussions.

Someone discovered a bug in a core library, and wanted to release a new version without forcing people to upgrade all of Perl and all of their libraries at the same time. In general, that's a good policy.

Someone decided that a CPAN module was so widely-used that it belonged in the core. I can see how that argument works. It's certainly worked on lots of modules in the past.

Someone said "That's great that this is in the core, but someone's paying me to use it with an older version of Perl, so I refuse to tie its syntax or feature set or capabilities to only those which the current development version of Perl provides." I can understand that desire.

None of these are bad ideas in and of themselves, but the consequences are poisonous together. The chosen solution to one problem was to maintain a core module somewhere outside of the core, where the Perl 5 porters weren't the people responsible for day-to-day maintenance, and to release new versions to the CPAN as they were ready, and to juggle the release schedule with the release schedule for new versions of the Perl core, which should always contain the newest versions of core modules available, even if those modules live elsewhere and have already had releases elsewhere.

Imagine trying to coordinate the release of the core language and half a dozen dual-lived modules. Now imagine trying to coordinate the release of the core language and several dozen dual-lived modules.

Now there's a big snarled shoelaces problem and we don't know anybody named Alexander.

Did I mention that some of these modules have to run on various versions of Perl, unmodified? (I spent a summer and fall writing tests for some of these modules, hearing from some maintainers that I couldn't use core modules in those tests because those core modules hadn't been in core long enough that they were available on older versions of Perl long enough for people to have them installed, nevermind that if they were capable of installing newer versions of these modules, they were fully capable of installing dependencies, and suddenly I wonder if I've had a persistent pounding in the back of my skull since about June of 2001, because sometimes it sure feels that way.)

Several modules are in this situation, and we don't have good answers to several questions:

  • Who maintains this code?
  • Which version of this code is authoritative?
  • Which release of this code should supersede other releases?
  • When will the next release take place?
  • What are the dependencies of this code?
  • What is the support and deprecation policy for this code?
  • What are the requirements for dependencies in language or supporting libraries for this code?

Good software developers and project managers ask and answer these questions regularly. Your answers are critical to the long-term success and maintainability of your project.

I'll talk about p5p's answers in my next entry.

(If you hate the pun in the title, trust me: all of the other candidates were worse.)

The Perl world often speaks of the DarkPAN. This is the CPAN's big brother. Where the CPAN contains millions of lines of freely-available, reusable, and tested Perl code, no one knows how big the DarkPAN is, what it contains, or if it's tested. Where there are CPAN testers, CPAN metrics, CPAN searching, comprehensive CPAN history, and CPAN code search and cross referencing, none of those features are available for the DarkPAN.

In short, we don't know where the DarkPAN is. We don't know what it contains. We don't know which versions of Perl it uses, which CPAN modules it uses, which XS functions it uses, which Perl idioms it uses, and which bugs it relies on going unfixed. It's a big wad of unknown that we suspect exists but can't measure.

The Perl 5 Porters have a fabulous resource named Andreas J. König, not coincidentally the maintainer of the CPAN module. Andreas often posts Bleadperl Breaks CPAN reports, where he identifies specific changes to Perl under development which have caused test failures in CPAN modules. At that point, the question is whether the change to bleadperl is wrong, or whether the module did something wrong. One or the other gets fixed.

That doesn't work for the DarkPAN. CPAN testing is automatable (proof: it's automated, and has been for years). DarkPAN testing is not. Every module uploaded to the CPAN and added to the index gets tested and analyzed and cross-referenced and, look at that, there's another piece of data added to the giant feedback loop which is community-driven software development. Every piece of DarkPAN code added stays in the dark, where none of this works...

... unless someone deliberately extracts relevant code from the DarkPAN and puts on the CPAN (which happens), or tracks bleadperl snapshots and testing releases and reports changes (which rarely happens), or files bugs after a release (which happens).

That's how community-driven software development works. If you're willing to work with the community in the open, we all benefit many times over.

There's one small problem, however. No one wants to break code in the DarkPAN. Note please that "break code" in the Perl 5 world is often a euphemism for establishing and following a deprecation policy. No one's advocating removing major features such as AUTOLOAD, as tempting as that might be. Yet the specter of the DarkPAN often arises in discussions where it's the final nuclear deterrant against adding a simple piece of syntax to simplify a task which is more laborious it ought to be in a language under development in 2009. "We don't know what it would break," they say, and that ends the argument.

When I write software which matters, I create a test suite. I choose dependencies carefully. I never upgrade it in production without running the tests. (Sometimes I run the test suite just for the fun of it.) The function of that code matters, so I'm cautious.

If I upgrade a dependency, I run the test suite. If I find a bug, I add it to the test suite. If I upgrade my version of Perl, I run the test suite. In short, I take responsibility for identifying and diagnosing all potentially functional changes to the behavior of my code even if they come from external resources. It's my responsibility as maintainer of that code to keep it running, and I know of no better way than to maintain a comprehensive, sane test suite to verify that it maintains the behavior I intended.

I expect the people to whom I distribute the software to do the same, so I include the test suite there. It's common sense software management. It's common sense operations.

Not so with the DarkPAN. Somehow, the Perl world has decided that this unmeasurable, unknowable conglomeration of code which may or may not exist is incapable of performing software and operations management to a minimal degree of competence, despite the fact that, for example, this is what operating system vendors do. This is what professional system administrators do.

Thus the burden of integration and deployment falls not to the people who know where and what DarkPAN software exists and how it works and what it uses, but to the least appropriate group of people: people who cannot possibly see it and can only guess at what it does, what it needs, how it does it, and if any of that is appropriate or supported.

I can imagine that if I called technical support for a product, I wouldn't get support if I refused to disclose any information about my problem -- or even if I actually had a problem.

Why should anyone expect differently from volunteers?

Toward a Sane Deprecation Policy

| 2 Comments

If your software really gets better over time, and if you've accepted the idea that backwards compatibility is technical debt, the important question is "How do I pay down that debt?"

The only approach I've seen work is to establish and maintain a sane deprecation policy. Specific details depend on your project and your users (and, likely, your availability to perform the work -- community-driven projects performed mostly by volunteers have different characteristics from paid projects), but the principles are the same.

Start with a regular release cycle. I prefer monthly releases for free software projects, but several projects do well with quarterly or twice-a-year releases. There are tremendous other benefits, but if you can make and meet your committment to releasing new versions of your software on a regular cycle, you add predictability to the process. Not everyone will upgrade with every new release, especially if you have a short cycle period, but you demonstrate frequent progress -- and you have to keep your software stable to have a hope of maintaining this release schedule.

Mark deprecated features as deprecated and give a timeline for their removal. In general, that's one release cycle. It can be more, but it can't be fewer. The severity of deprecation depends on the scope of the change. Perhaps adding a warning to a deprecated feature will help people know to upgrade. Perhaps you can provide a backwards-compatibility feature as an optional extension. You must notify people that there will be change, so that they can prepare for it.

Note that aggressive release schedules, such as monthly releases, may be too short a period for larger deprecations. Appropriate periods depend highly on the nature of your work, but public projects probably should have a three month warning period for significant deprecations.

Remove the deprecated feature when its time has come. Get rid of the code. Stop carrying that baggage around.

That sounds easy, doesn't it? It is! All it takes is discipline, committment, and -- okay, you have to document your deprecation policy and refer to it prominently. Some people will complain that they want to upgrade to new versions of your software which behave the same way as old versions without changes. Ignore them. Desire doesn't make paradoxes logically consistent, even if you really really want it.

Before you post a comment, please note what I didn't say.

I didn't say "Break old features for the sake of breaking them."

I didn't say "Deprecate features that people are using for the sake of expermenting with new ideas."

I didn't say "Never undeprecate a feature" or "Never extend the deprecation or warning period if the change is difficult or widespread."

You still have to use your best judgment -- but once you've achieved the discipline of regular releases and have written your support policy, you have the benefit of being able to discard old code and shave off the rough spots of misfeatures until they're right. It's not always easy to reach that point, but it's always valuable to do so.

First, some caveats:

  • I use the word "maintenance" to refer to software under current development, even if that development is only for bugfixing purposes, and not adding new features. Thus almost all software undergoes maintenance.
  • The risk guidelines apply only to code undergoing maintenance. If you are not and will not modify software, the risks are immaterial.
  • Guidelines are guidelines, not universal precepts. They're rules of thumb. They may not apply in all cases. They meet my experience, but I know that a few projects have had different experiences. Some people who smoke never get lung cancer, but it's foolish to ignore the connection. That's why I call them risks and guidelines.
  • I believe that software can and should get better over time.

Suppose you write a successful library. You had a great idea, and you implemented it, and now people use it. That experience has given you further ideas for enhancements. You'd like to make your library more powerful, or easier to use, or generally better.

Thus you experiment. You play with different API ideas. You look at solutions to similar problems. You ask some of your best users for feedback, and you release a new version of the library.

Repeat this a few times, and you'll discover that you don't always get everything right the first time. You face the awkward question of how to make improvements while not stranding your existing users.

Consider the risks. First, you risk getting an API or a design wrong by making any changes or adding any features. You can ameliorate these risks by being very cautious, but you can't eliminate this risk. You're probably not an expert on the problem area unless you've already solved the problem with code multiple times, in which case why are you starting over? Only feedback will tell you if you've done it right.

The second risk is that no one will care. You can't mitigate this. Release it anyway.

The third risk is that you did it wrong, and you'll look like a fool. Research can fix this. Not caring what random people on the Internet will think helps. Adding disclaimers helps (but only because you can tell people to read the disclaimers before arguing with you, and then they look silly).

The fourth risk is that you get it kind of right and kind of wrong, and to get it more right, you have to make changes to the wrong parts, and that will change how your users interact with the code.

That's the scariest risk. With all of these wonderful users, how can you tell them that you didn't get it all right, and they may have to suffer through an incompatible change? (I think the argument is easy to make; you need a sober assessment of their risks and responsibilities, but that's a different facet to explore another day.)

Consider the risk of not making improvements when you see them. Ahh, now you understand all the talk of risk.

Technical Debt (see also Design Debt) is a measurement of how easy it is to work with the code. An Approximate Measure of Technical Debt argues that every line of code is a good approximation of technical debt. In general, the more code, the more technical debt a system is likely to have.

You see technical debt every time you go to add a feature or fix a bug and it's more difficult than it should be. Perhaps the name of a variable or function is wrong. Perhaps a comment is misleading -- imagine that. Perhaps there's duplication, or near duplication. There may be good reasons why the code is in that state. Sometimes those shortcuts are necessary, or sometimes the right approach isn't obvious without more experience, or sometimes changes elsewhere give you the opportunity to improve abstractions and coalesce near-duplication into duplication to remove it. If software really can get better over time, we should expect this to happen.

Consider, however -- every feature you don't need represents technical debt. Code you might use in the future represents technical debt, even if only scrolling past it in your editor, or that extra second spent compiling, or a function name you have to skip over in the API documentation.

Take that argument one step further. Code that exists solely for backwards compatibility -- to allow people to continue to use broken, or old, or clunky, or wrong code -- is very nasty, very expensive technical debt.

You can't always avoid taking on technical debt, but if you're maintaining software, you'll always pay interest on that debt until you pay off the debt. Like duplication, backwards compatibility leading to huge amounts of technical debt can eventually crush a project. You need a plan to get rid of it. Refactoring helps remove duplication and improve designs. Only deprecation -- and removal -- can remove backwards compatibility debt.

Easier/Better Over Time

| 2 Comments

In Stop Preventing the Future!, I promised to talk about sane deprecation policies. I want to digress for this entry to give one more explanation of why keeping up with the future is important.

I believe that software under active maintenance should get easier to work with over time.

I realize that that statement contradicts the direct experiences from many, if not most, software projects. We use the term "legacy software" to imply something old, crufty, broken, and difficult to maintain. (When I use the term "legacy software", I mean "software without a future" or "software not receiving maintenance". The difference is subtle. Perhaps I should post about that, too.)

One of the persistent problems of project management is predicting the costs of change. One of the persistent temptations of Big Planning Up Front design methods comes from the idea that change is expensive, and it gets more expensive over the lifetime of a project.

While I agree that this rule is true for many projects, I believe it's a symptom of other problems, and not the source of those problems. (I also have trouble taking seriously any project without a systemic automated testing plan and regular refactoring. Do you people even care about your source code?)

I can't count how many bugs there are in Perl 5 programs because the return value of system() is a C-style return value, not a Perlish return value. Should someone have designed that API correctly from the start? Probably -- but that didn't happen. If someone had changed it in Perl 5.6 in 2000, it could have prevented a decade of those bugs. Would that change have been painful? Probably.

Is the pain of a single, well-informed change greater than the pain of uncountable multitudes of bugs? I doubt it.

Consider a more positive example. Perl 5.10 changed a diagnostic message such that when you use an undefined value in a catenation or interpolation, Perl reports the name of the variable containing undef. This is a tremendous benefit to debugging -- but it changes the format of a warning on which existing code may have relied.

In this case, Perl 5.10 is easier to work with, because a common warning is much, much easier to debug. It's a small change, but it's the kind of small change you quickly grow to rely on, similar to the strict pragma telling you that you've made a typo in a variable name. Sure, it only helps prevent silly little bugs, but the less time I spend chasing silly little bugs, the more time I have to solve real problems.

No one knows how much DarkPAN code parsed the text of the old warning. Maybe none. Maybe thousands of programs. Changing all of them may be a daunting task. Maybe it's worth it. Maybe it's not. Saving a few seconds of debugging time for a million Perl programmers is definitely an improvement.

That's what confuses me about the reticence to make other, larger improvements. The design choices of Perl 5.000 (released on October 17, 1994) are sunk costs. We can't go back in time and fix them. We can only fix them in modern versions of Perl, such as 5.10.1 and newer. The question is whether we should.

In my mind, that's not even a question. If Perl isn't getting easier to use or generally better over time, why bother to release new versions? Why maintain it?

Yes, change can be painful... but keeping up with modest changes on a predictable schedule makes it tolerable. The amount of changes you can make to a project you're maintaining is limited by the amount of work you can do in a day anyway. Why not keep up with the present, and stop preventing the future?

Stop Preventing the Future!

| 2 Comments

One of my goals with Modern Perl is to improve the entire Perl ecosystem for both Perl 5 and Perl 6 such that everyone can take advantage of all of the wonderful improvements already provided and yet to come. First, we have to convince people that that's possible.

In Sacrificing the Future on the Past's Golden Altar I mentioned that Perl 5's deprecation policy has harmed Perl 5 over the past decade, if not longer. Several people asked me for a better alternative.

It's no coincidence that I've worked on Parrot for the past several years. At the most recent Parrot Developer Summit last December, we discussed our support policy for Parrot as we near the Parrot 1.0 release. I've just finished writing the initial version of Parrot's release, support, and deprecation policies. (I apologize that it's in raw POD form; we'll add it to the website soon.)

I don't want to get into too many details about deprecation and support, nor how aggressive the Parrot schedule is for the foreseeable future, but I do want to explain some of the reasoning. It's important for all projects, not just large and, we hope, successful community-developed projects.

I believe strongly that the best way to invent the future is to iterate on a theme. That's part of the reason I write these posts -- I'm trying out new ideas on a growing audience of smart, dedicated, and committed readers who rarely hesitate to challenge my underthought assumptions or ask for clarity when I've been obtuse. The same principle goes for software.

If you know exactly how to solve a problem before you've written any code, it's worthless to solve the problem yourself. Re-use existing code, then spend your time and resources on something that matters more.

If you don't know exactly how to solve a problem, you're unlikely to find the best solution on your first attempt. That may be fine. Your first attempt may be good enough. If so, great!

The problem starts in so many cases where the first attempt isn't perfect and needs further work. We call this debugging. Usually it's also a design problem.

Two complementary schools of thought address this problem from different approaches. The agile movement suggests that working in very small steps and solving small pieces of larger problems in isolation helps you avoid thrashing and rework and all of the organizational problems you have when you're trying to solve very large and very complex problems. The refactoring school suggests that very focused and reversable changes to the organization of code and entities within the code make it easier to write good code in the future.

It's possible to have one without the other, but they build on each other.

The allure of both approaches is that they promise to free you from the golden chains of "I Must Get This Completely Right The First Time." You don't. You do have a minimum standard of quality and efficacy, and it's important to meet those goals, but they make change less risky and even cheap. I didn't say that practicing either one is easy or simple, just that I know of no better way to reduce the risk of mistakes. If they're small and easy to detect and easy to fix, you don't have to worry about making them.

Of course, this only matters if you're going to change your software in the future. If you write a program and run it and you don't need it in ten minutes, none of this matters. If you write a program and install it on a machine and it can run for the next year or ten years untouched, none of this matters. The cost of change is irrelevant for software that never changes.

Most of us rarely have the luxury of writing software that never changes.

Perhaps there's a common illusion that people who write software for other coders to reuse in their projects -- whether languages, libraries, platforms, or tools -- should meet a standard higher than most other projects. To some degree it's true. Many projects which get widely used attract better developers and development strategies. Many don't.

Yet I don't believe there is a general solution to the problem that we don't get code and design right on our first try. We make mistakes designing languages and libraries. We make mistakes implementing platforms and tools. Sometimes the best we can do to make things righter is to make an incompatible change. As long as our code gets easier to use and maintain over time, I can live with that.

The question isn't "Should a project make backwards-incompatible changes?" (The question very much isn't "Should a project do so gratuitiously?", so if you want to argue that point, please do so elsewhere. The real question is "How do you do make incompatible changes when necessary without hurting your users?"

I'll discuss some ideas next time.

I believe that programming languages, libraries, and tools should get easier to use over time -- if we're careful about creating good abstractions and providing usable, simple APIs. We don't always get those right the first time. Sometimes we make mistakes.

I also believe that we should identify and fix mistakes as soon as possible. This goes against the prevailing Perl 5 philosophy which spends an inordinate amount of effort trying never to break existing code, even if it relied on huge mistakes and interfaces broken-as-designed.

Contrary to some reports, I am sympathetic to the idea of keeping existing code working. It's infeasable to change thousands of lines of working code just to keep up with fashions and fads. Yet it's important to weigh the benefit of simplicity and correctness for all new code written in the future against the potential of breaking existing code which may never run on an updated version.

This is especially true for some of the core library which has been part of Perl 5 since the beginning. Consider the working-but-much-maligned File::Find, which features one of the worst possible interfaces imaginable. File::Find has been part of Perl 5 since the start:

use Modern::Perl;
use Module::CoreList;

say Module::CoreList->first_version( 'File::Find' );

# output: 5

File::Find traverses a directory tree, looking for files or directories which match arbitrary criteria. You pass the find() function a subroutine reference as callback and a list of directories in which to look. For every file or directory found, the module calls your callback. A typical use looks like:

use Cwd;
use File::Find;

sub wanted { ... }

find( \&wanted, cwd() );

My favorite part of the documentation for File::Find is:

The wanted function takes no arguments but rather does its work through a collection of variables.

$File::Find::dir is the current directory name,
$_ is the current filename within that directory
$File::Find::name is the complete pathname to the file.

In the fourteen-and-a-half years since the release of File::Find, many people have questioned the design of an API which uses global variables to send data to a callback function. (Perl has allowed passing values to functions for over twenty-one years now -- since Perl 1.0 in 1987.) Global variables have been a bad idea in programming languages since just about the invention of structured programming.

Yet this wasn't fixed in 2000, when I noticed it. This won't be fixed in 2009. Why not? In 2000, the answer was "too much code depends on the existing behavior". That was the wrong answer then, and it's worse now. Nine years later, even more code depends on the existing behavior. Entrenched mistakes dig the hole ever deeper. Over time, it gets more difficult to correct problems -- not easier. By refusing to make a necessary but backwards-incompatible change, the Perl 5 developers penalized current and future users and continued to penalize existing users, all for the sake of not penalizing an unknown number of existing users no one had surveyed or counted.

Prioritizing the past over the future is a great way to ruin your language's future.

A lot of people pin their hopes on Perl 6 for a cleanup of the Perl language and its libraries. I believe the design of Perl 6 improves every part of Perl that it touches. Yet for all of the care that has gone into Perl 6 and will continue to go into Perl 6 through its twenty-year lifespan, the same subtle temptation will plague every contributor. Unless its designers, developers, maintainers, and contributors practice the difficult discipline of relentlessly removing misfeatures in favor of cleaner, saner, smarter, and easier to use and to understand replacements, Perl 6 will eventually go the way of Perl 5: a shining gem of a language buried under accreted muck.

I hope to help wipe away some of the muck of Perl 5 -- but I want to prevent that muck from accumulating in Perl 6.

A quote attributed to Ward Cunningham suggests that Smalltalk failed because its tools were too good — they let people manage small messes in Smalltalk until they grew into big messes.

I can believe it. Think of the C programming language. C programs can be incredibly simple and straightforward; the language itself is reasonably easy to understand, with only a few crazy ratholes related to function pointer declarations. Yet the language provides only the building blocks for programs, and it's easy to believe that the C language is simple while acknowledging that C programs can be exceedingly complex. Raw pointers and pointer offset math may be simple concepts if you understand how CPUs and memory work, but that simplicity does not make C programs simple. (It also doesn't work portably across platforms unless you understand how all of the relevant CPUs work. Hooray for struct padding and alignment concerns.)

Perl occupies a sweet spot in language design and use. Larry describes it as a combination of manipulexity and whipuptitude. Perl has the power and flexibility to let experts twist and turn it until it does exactly what they want. It's also simple and expressive enough that novices can get their work done without worrying about theoretical underpinnings, big O notation, or even how memory allocation and hashing and vectors work.

Of course, it's only a rhetorical device to anthropomorphize a language. Perl doesn't allow all of these things. Perl is merely a tool in the hands of people who make these decisions. Like Smalltalk, Perl will let you go a long way even if what you're doing isn't maintainable. (Try ignoring memory allocation and proper pointer usage in a C program. It won't run well for long.) Like Smalltalk, you can end up with a big mess very quickly.

My colleague Allison once said "The last four hundred lines of this loop are kind of messy."

I have trouble understanding what kind of mindset it requires to write a loop with four hundred lines — and I recall that she had already refactored several hundred lines out of that loop.

Put another way, you'd have real trouble reading this essay — as short as it is — without paragraph breaks. Take out the punctuation and it gets more impenetrable. Remove the spaces. You won't even bother.

I won't pretend that people are great writers. I've read too much fanfic on the Internet and I wrote too much bad poetry as a teenager to suffer that delusion, but people generally understand how to arrange their thoughts into sentences and paragraphs. They know how to hit the Enter key every now and then to separate unrelated thoughts.

Find and replace and automated refactoring tools and cross-referencing and compilers which treat all vertical whitespace as more or less the same and a function of five lines as equivalent to a function with five thousand lines are tools which allow us to manage messes in the same sense that Smalltalk's lovely browser lets Smalltalk programmers manage ravioli code. Our tools let us work around nascent problems until they grow too large for our tools. Then we're in real trouble.

I don't want to manage little messes. I want to eliminate them. Only the relentless desire to eliminate non-essential complexity will work. Will teaching people to think of code — not in terms of lists of statements to execute or methods to send or mathematical formulas to solve — as words, sentences, and paragraphs help?

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide

Categories

Pages

About this Archive

This page is an archive of entries from February 2009 listed from newest to oldest.

January 2009 is the previous archive.

March 2009 is the next archive.

Find recent content on the main index or look in the archives to find all content.


Powered by the Perl programming language

what is programming?