What Testing DSLs Get Wrong


A conversation between the owners of ClubCompy about language design, syntax errors, and testing led to an interesting exchange (lightly edited for coherence):

How do you go about testing order of operations and languages?

You need a minimal test driver that takes an expression or series of expressions and verifies that it either parses or produces a syntax error. The test matches part or all of that error.

Any given input either parses correctly or produces an error.

Our current test framework cannot "see" when there is a syntax error. We set a flag right before the end of our test programs and test that that flag has the right value.

The most robust strategy I've seen is to add a parse-only stage to the compiler such that you feed it code and catch an exception or series of errors or get back a note that everything's okay.

You can inspect a tree structure of some kind to verify that it has all of the leaves and branches you expect, but that's fragile and couples the internals of the parser and compiler and optimizer to the internals of your tests.

Is having a huge battery of little code snippets that run or fail with errors the goal?

Ideally there's as little distance between "Here's some language code" and "Here's my expected results" as possible. The less test scaffolding the better.

I've never been a fan of Behavior Driven Development. I think Ruby's Cucumber is a tower of silly faddishness in software development. (Any time your example walks you through by writing regular expressions to parse a subset of English to test the addition of two numbers, close the browser window and ask yourself if slinging coffee for a living is really such a bad idea after all.)

I neither want to maintain nor debug a big wad of cutesy code that exists to force my test suite into "reading like English"—as if the important feature of my test assertions were that they looked like index cards transcribed into code.

Nor do I want to spend my time tweaking a lot of hairy procedural scaffolding to juggle launching a compiler and poking around in its guts for a magic flag so that, a couple of dozen lines of code later, I can finally say yes or no that the 30 characters of line noise I sent to the compiler produced the error message I expected.

I want to write simple test code with minimal scaffolding to highlight the two important attributes of every test assertion:

  • Here's what I did
  • Here's what I expected to happen

That means I want to write something like:

parses_ok 'TOCODE i + 65',
    'precedence of + should be lower than that of TOCODE';

Instead of:

Feature: Precedence of keywords and arithmetic operators
  In order to avoid parse errors between keywords and arithmetic operators
  As an expert on parse errors
  I want to demonstrate that keywords bind more tightly to their operands than do operators

  Scenario: TOCODE versus +
    Given code of "TOCODE i + 65"
    When I parse it
    Then the result should parse correctly without error

Which would you rather read, run, and debug?

All of these "DSLs for $foo" jump too far over the line and try to produce the end goal their users need to make for themselves. I don't want a project that attempts to allow me to write my tests in a pidgin form of English (and I get to parse that mess myself, oh joy, because I'm already testing a parser and the best way to test a parser is to write a custom fragile parser for natural language, because debugging that is clearly contributing to real business value).

Ideally, I want to use a library someone else has written that can launch my little compiler and check its results. I want to use this library in my own test suite and have it integrate with everything else in the test suite flawlessly. It should express no opinion about how I manage and arrange and design the entire test suite. It should neither own the world, nor interfere with other tests.

In short, if it has an opinion, it limits that opinion to just a couple of test assertions I can choose to use or not.

In other words, I still want Test::Builder because T::B lets me decide the abstractions I want or don't want and reuse them as I see fit. After all, good software development means building up the vocabulary and metaphors and abstractions appropriate to the problem you're solving, not adopting a hastily-generalized and overextended pidgin and trying to force your code into the shapes demanded.

If I'm going to have to write code to manage my tests anyway, I'll make the input and expected output prominent—not a boilerplate pattern of repetition I have to parse away anyhow.


Ok, I see the utility of a simple "parses_ok" test, but I think we also need a "evaluates_same" to really validate the example you gave.

TOCHAR i + 65 could parse as TOCHAR (i + 65) or as (TOCHAR i) + 65. In the Tasty language, both are actually valid expressions. If i were 0, TOCHAR (i + 65) would evaluate to "A" and (TOCHAR i) + 65 evaluates to "{big x}65" because the 65 is coerced to a string and concatenated to TOCHAR i's string. Obviously, I want TOCHAR i + 65 == TOCHAR (i + 65)

So, do you think a ...

evaluates_same 'TOCODE i + 65', TOCHAR (i + 65), 'precedence of + should be lower than that of TOCODE';

test is needed, or does that kind of test create the fragility you are concerned about?

Parsing and evaluation are different concerns; I'd have two tests in this case, if the parsing test were worth writing. (I believe it is--I want this kind of precedence tested appropriately.)

Then again, if we trust that parenthesizing expressions works appropriately, we don't have to test this at any level other than the parser parenthesizing test.

I've never used Cucumber in anger, but I thought it was for creating testcases that could be understood by non-technical clients, so you can concretely discuss features. If you're writing a compiler then all your clients will be programmers, so there's no need for such a thing.

Our clients are the parents, guardians, and teachers of children between the ages of eight and twelve inclusive.

The intent of Cucumber is to make readable testcases, just as the intent of COBOL and AppleScript and visual component programming is to enable non-programmers to create software without having to learn how to program.

Yes, I agree. Seems like in an attempt to try to make it suitable for non-programmers, cucumber complicates the syntax and its parsing and becomes much more error-prone. So I'm not a fan of it either. That put aside Ruby’s rSpec, on which I believe cucumber is based, is not as bad as cucumber syntax-wise, and also claims to fall under the Behaviour-Driven Development (BDD) umbrella.

In addition to all that, I've heard different views of how BDD differs from traditional Test-Driven-Development (TDD). Someone once said that there isn't a real difference, and someone on IRC told me that from what they understood, TDD was more about unit-tests, while BDD was more about integration tests and system tests. Personally, I have used Test::More and related modules to write some system tests and integration tests for my modules, and I don't see why they are not suitable for that (though I admit something like FIT or INGY's Test::Base may be somewhat better suitable if you're doing a lot of data-driven testing). Part of this may be an issue I have, that when I write a test, I usually don't consciously think whether it is a unit test or a system test or whatever, but just try to write a meaningful test that reproduces and tests the offending behaviour.

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

affiliated with ModernPerl.net



About this Entry

This page contains a single entry by chromatic published on April 2, 2012 11:27 AM.

Bulk Orders for User Groups was the previous entry in this blog.

-Ofun for Whom? is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Sponsored by Blender Recipe Reviews and the Trendshare how to invest guide

Powered by the Perl programming language

what is programming?