What Testing DSLs Get Wrong

A conversation between the owners of ClubCompy about language design, syntax errors, and testing led to an interesting exchange (lightly edited for coherence):

How do you go about testing order of operations and languages?

You need a minimal test driver that takes an expression or series of expressions and verifies that it either parses or produces a syntax error. The test matches part or all of that error.

Any given input either parses correctly or produces an error.

Our current test framework cannot "see" when there is a syntax error. We set a flag right before the end of our test programs and test that that flag has the right value.

The most robust strategy I've seen is to add a parse-only stage to the compiler such that you feed it code and catch an exception or series of errors or get back a note that everything's okay.

You can inspect a tree structure of some kind to verify that it has all of the leaves and branches you expect, but that's fragile and couples the internals of the parser and compiler and optimizer to the internals of your tests.

Is having a huge battery of little code snippets that run or fail with errors the goal?

Ideally there's as little distance between "Here's some language code" and "Here's my expected results" as possible. The less test scaffolding the better.

I've never been a fan of Behavior Driven Development. I think Ruby's Cucumber is a tower of silly faddishness in software development. (Any time your example walks you through by writing regular expressions to parse a subset of English to test the addition of two numbers, close the browser window and ask yourself if slinging coffee for a living is really such a bad idea after all.)

I neither want to maintain nor debug a big wad of cutesy code that exists to force my test suite into "reading like English"—as if the important feature of my test assertions were that they looked like index cards transcribed into code.

Nor do I want to spend my time tweaking a lot of hairy procedural scaffolding to juggle launching a compiler and poking around in its guts for a magic flag so that, a couple of dozen lines of code later, I can finally say yes or no that the 30 characters of line noise I sent to the compiler produced the error message I expected.

I want to write simple test code with minimal scaffolding to highlight the two important attributes of every test assertion:

Here's what I did
Here's what I expected to happen

That means I want to write something like:

parses_ok 'TOCODE i + 65',
    'precedence of + should be lower than that of TOCODE';

Instead of:

Feature: Precedence of keywords and arithmetic operators
  In order to avoid parse errors between keywords and arithmetic operators
  As an expert on parse errors
  I want to demonstrate that keywords bind more tightly to their operands than do operators

  Scenario: TOCODE versus +
    Given code of "TOCODE i + 65"
    When I parse it
    Then the result should parse correctly without error

Which would you rather read, run, and debug?

All of these "DSLs for $foo" jump too far over the line and try to produce the end goal their users need to make for themselves. I don't want a project that attempts to allow me to write my tests in a pidgin form of English (and I get to parse that mess myself, oh joy, because I'm already testing a parser and the best way to test a parser is to write a custom fragile parser for natural language, because debugging that is clearly contributing to real business value).

Ideally, I want to use a library someone else has written that can launch my little compiler and check its results. I want to use this library in my own test suite and have it integrate with everything else in the test suite flawlessly. It should express no opinion about how I manage and arrange and design the entire test suite. It should neither own the world, nor interfere with other tests.

In short, if it has an opinion, it limits that opinion to just a couple of test assertions I can choose to use or not.

In other words, I still want Test::Builder because T::B lets me decide the abstractions I want or don't want and reuse them as I see fit. After all, good software development means building up the vocabulary and metaphors and abstractions appropriate to the problem you're solving, not adopting a hastily-generalized and overextended pidgin and trying to force your code into the shapes demanded.

If I'm going to have to write code to manage my tests anyway, I'll make the input and expected output prominent—not a boilerplate pattern of repetition I have to parse away anyhow.

5 Comments

https://me.yahoo.com/a/tg2UlMA0l.tuMR9Ocn8_v.Spn_gQqw--#9dfe4 | April 2, 2012 12:35 PM

Ok, I see the utility of a simple "parses_ok" test, but I think we also need a "evaluates_same" to really validate the example you gave.

TOCHAR i + 65 could parse as TOCHAR (i + 65) or as (TOCHAR i) + 65. In the Tasty language, both are actually valid expressions. If i were 0, TOCHAR (i + 65) would evaluate to "A" and (TOCHAR i) + 65 evaluates to "{big x}65" because the 65 is coerced to a string and concatenated to TOCHAR i's string. Obviously, I want TOCHAR i + 65 == TOCHAR (i + 65)

So, do you think a ...

evaluates_same 'TOCODE i + 65', TOCHAR (i + 65), 'precedence of + should be lower than that of TOCODE';

test is needed, or does that kind of test create the fragility you are concerned about?

chromatic replied to comment from https://me.yahoo.com/a/tg2UlMA0l.tuMR9Ocn8_v.Spn_gQqw--#9dfe4 | April 2, 2012 12:41 PM

Parsing and evaluation are different concerns; I'd have two tests in this case, if the parsing test were worth writing. (I believe it is--I want this kind of precedence tested appropriately.)

Then again, if we trust that parenthesizing expressions works appropriately, we don't have to test this at any level other than the parser parenthesizing test.

pozorvlak | April 4, 2012 9:35 AM

I've never used Cucumber in anger, but I thought it was for creating testcases that could be understood by non-technical clients, so you can concretely discuss features. If you're writing a compiler then all your clients will be programmers, so there's no need for such a thing.

chromatic replied to comment from pozorvlak | April 4, 2012 10:22 AM

Our clients are the parents, guardians, and teachers of children between the ages of eight and twelve inclusive.

The intent of Cucumber is to make readable testcases, just as the intent of COBOL and AppleScript and visual component programming is to enable non-programmers to create software without having to learn how to program.

Shlomi Fish | April 12, 2012 11:04 AM

Yes, I agree. Seems like in an attempt to try to make it suitable for non-programmers, cucumber complicates the syntax and its parsing and becomes much more error-prone. So I'm not a fan of it either. That put aside Ruby’s rSpec, on which I believe cucumber is based, is not as bad as cucumber syntax-wise, and also claims to fall under the Behaviour-Driven Development (BDD) umbrella.

In addition to all that, I've heard different views of how BDD differs from traditional Test-Driven-Development (TDD). Someone once said that there isn't a real difference, and someone on IRC told me that from what they understood, TDD was more about unit-tests, while BDD was more about integration tests and system tests. Personally, I have used Test::More and related modules to write some system tests and integration tests for my modules, and I don't see why they are not suitable for that (though I admit something like FIT or INGY's Test::Base may be somewhat better suitable if you're doing a lot of data-driven testing). Part of this may be an issue I have, that when I write a test, I usually don't consciously think whether it is a unit test or a system test or whatever, but just try to write a meaningful test that reproduces and tests the offending behaviour.

Tags:

5 Comments

Modern Perl: The Book

Categories

Monthly Archives

Pages

About this Entry