From Novice to Adept: The Act of Naming


I write a lot of little conversion programs. They take a command-line argument or two, loop over a series of files, read them, convert them, manipulate them, mangle them, then write them out elsewhere. It's the Unix filter pattern (and, one might argue, the functional programming pattern).

These programs tend to be, at most, 100 lines of code, with significant whitespace.

I often start by writing a couple of helper functions, one to find the names of all of the interesting files, one to read a file, one to process the input names into output names, and one to write a file. I should abstract this whole process into a reusable framework, but I haven't figured out the appropriate genericity yet.

The important point is that I start with names:

my $scenes = get_scene_list();

for my $chapter (get_chapter_list())
    my $text = process_chapter( $chapter, $scenes );
    write_chapter( $chapter, $text );

die( "Scenes missing from chapters:", join "\n\t", '', keys %$scenes )
    if keys %$scenes;


sub get_chapter_list { ...  }

sub get_scene_list { ... }

sub process_chapter { ... }

sub read_scene { ... }

sub write_chapter { ... }

This particular program is 88 lines of code, with copious whitespace and BSD-style brace placement. There's no functional reason to write it with functions instead of straight-line code that operates on global variables. There's only aesthetic practicality.

Names matter.

I don't have to show you the contents of any of these functions because their names describe what they do. They don't tell you how they do what they do, but you can get a sense of the organization and intent of the code by reading the simple control flow here.

Very little novice code I've seen makes the attempt at organized, named structure. I can appreciate that design and abstraction and factoring are all skills learned through practice just as much as is programming effectively in a language. Even still, these are important skills to learn.

I've heard a lot of people try to explain subroutines as "Pieces of reusable code". That's wrong; I think the Forth and Lisp and domain-driven design people have it right here. A subroutine is a way to name a set of behavior. It's an abstraction. Being able to identify and name individual sets of behavior is essential to being able to solve problems well.

Thinking in terms of sets of behavior -- individual units of behavior -- is essential to programming well.


I think this point goes further.

Naming things is one of the most important parts of OO design for me as a way of validating my conceptual understanding of the problem space.

If I can't find a sensible name for a concept (what it does or what it is), and unless that name describes *everything* it does, chances are something is not yet clear.

If I feel I'm being repetitive or redundant, the concepts tend to still too tightly coupled, and I need to split something up (possibly merging several semi related concepts first).

This is where cardboard programmers really help (mine happens to be a real human who just knows to scroll past lots of IM backlogs ;-). By explaining the problem space using my made up vocabulary I can validate every part of it, ensure that it's coherent, and get a feel for it's completeness, simplicity etc.

By the time I'm done I have a much stronger grasp on the problem space and how I'm going to solve it, and this usually only takes several minutes of talking to myself ^_^

Also, I think part of the problem with people giving things shitty names is that in C removing vowels from symbol names makes your code run faster.

Also, I think part of the problem with people giving things shitty names is that in C removing vowels from symbol names makes your code run faster.

What? How come? The only place where a compiled and linked C code is aware of names is in external symbols (functions, global variables, etc.) in shared libraries, DLLs. And this is quickly resolved at the program's start up time by the dynamic linker. Giving short names to C functions is an ultra-micro optimisation, which I think is not worth it. So [citation needed].

I should note that Perl 5 by virtue of being a symbolic language has a constant run-time cost for using long identifier names, because every global identifier has to be looked up every time it is used (and which the string-eval is aware of.). It's probably a worse performance hit than making C identifiers excessively long can ever be, actually, but again a micro-optimisation.

@chromatic: yes, agreed. Though I think that naturally if you have two portions of nearly duplicate code, you should extract a subroutine from it, and call it from two places. So a subroutine is not just a way to label stuff.

In C99 and C++ one can use inline functions to label portions of the code, and avoid the overhead of the function calls. Recently, I converted many functions and C macros (not Lisp macros, naturally) in Freecell Solver to inline (with a preprocessor fallback and some CMake-magic for compilers that don't support it), and this resulted in a more modular, easier to understand, and slightly more performant code. So you can enjoy better modularity and conveying of intention without suffering from duplicate code.

I'm not sure how much inline functions will make sense in perl 5. perl 5 probably has so much run-time overhead, that making functions inline may be a huge micro-optimisation.

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide



About this Entry

This page contains a single entry by chromatic published on November 2, 2009 2:15 PM.

From Novice to Adept: On Answers to Smart Questions was the previous entry in this blog.

From Novice to Adept: Functional versus Structural Code is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Powered by the Perl programming language

what is programming?