Parallelism and Test Suites

The relentless pursuit of user efficiency exposes drawbacks now and then. I added HARNESS_OPTIONS=j9 to my .bashrc a while ago, and then noticed that my regular CPAN updates (cpan-update -p | cpanm) had a lot more failures than usual.

Test::Harness (and its internals, TAP::Harness) use the environment variable HARNESS_OPTIONS to customize some of its behavior. This is very useful when running Perl tests through make test or ./Build test or any other mechanism where you don't launch the harness directly.

The j flag allows you to request that the harness attempt to run multiple test files in parallel. If you have multiple .t files and multiple cores in your computer, chances are that parallelism will speed up the test run. (I notice that a lot of my tests are IO bound, not CPU bound, so I can run more tests than I have cores.) My use of j9 works well on my four core machine; your numbers will vary based on your workloads and hardware.

Unfortunately, it's easy to write simple tests which just don't work in a parallel world. Consider the TestServer.pm module used to test Test::WWW::Mechanize. (I chose this as an example because Andy's a good sport, and because I've already opened a pull request for it.) This module starts a server for each test to control the responses returned to the Mech object. That's all well and good; it tests network communication in a mostly real way (yes, the loopback interface isn't exactly the same as a remote server, but it's real enough for most testing uses).

The TestServer constructor in 1.38 is:

our $pid;

sub new {
    my $class = shift;

    die 'An instance of TestServer has already been started.' if $pid;

    # XXX This should really be a random port.
    return $class->SUPER::new(13432, @_);
}

You can probably see the problem already from the comment. If multiple .t files use this module (and they do), and if these files each run in separate processes (and they do), then if these files run simultaneously (as they do i a parallel testing environment), only one file will be able to bind to this port and the others will all abort and cause test failures.

In fact, this is what happens.

I submitted a silly little patch which changes the port to:

    return $class->SUPER::new(13432 + $$, @_);

... which should reduce the likelihood of collisions. (For more safety, the code should check that the given port number is available, but then you have to deal with race conditions and so forth, and there's a point at which adding more complexity to your test just isn't worth it. Also, $$ can be greater than 65535, as Pete Krawczyk points out, so there out to be a sane modulus in there.)

The principle is this:

Manipulating external state in a test file reduces the possible parallelism of your test suite.

You can see the same thing when you write to hard-coded directories in certain tests. (Use File::Temp to create temporary directories—which can clean themselves up!). You can also see the problem when you use a single database for testing (use something like DBICx::TestDatabase to create and populate a database in memory).

Anti-parallelism bugs in test suites are unnecessary and in most cases are easy to fix, once you know what to look for. As the CPAN continues to grow and as our applications rely on more and more great dependencies, the mechanisms we use to manage our code become ever more important. It's easy to avoid these problems—and it's even easier to understand why parallel testing is valuable when you can cut your test run wallclock time in half.

10 Comments

Aristotle Pagaltzis | November 21, 2011 5:46 PM

Probably we need a few CPAN Tester smokers with parallel tests enabled, in order to get a grip on this problem over the long term. Otherwise it’ll just be spotty one-off corrections without coherent progress.

chromatic replied to comment from Aristotle Pagaltzis | November 21, 2011 11:19 PM

Someone has to be able to diagnose parallel testing failures as parallel testing failures and not weird one-off failures from weird smoker configurations. Maybe that's as simple as trying to run parallel tests first, then running serial tests and seeing if things change.

martinjevans.myopenid.com | November 22, 2011 5:35 AM

In the test suites I have maintained for various non CPAN projects speed has rarely been an issue. However, the test suite I am working with now takes a long time to run and parallelism would speed parts of it up (but only parts of it as sometimes a test needs to wait for something external to happen and often those things only happen some time in the future and are out of control of the test suite itself).

The most notable long test suite is for Perl itself - I wouldn't mind that one running more quickly.

As for CPAN modules, few I install take much more than a minute (many far less). Take one of my modules, DBD::ODBC. Obviously the test speed is dependent on comms with your database but for me, here, it takes 21s to run - not long. The DBD::ODBC test suite goes out of its way to ensure all test tables/procedures/functions/views created are removed after the test but many of the tests use the same names in different tests. Also, running with -j9 might not be possible due to limits on simultaneous connections to the database. Are you really suggesting I should recode this test suite just so someone can run -j9 and speed the test up?

chromatic replied to comment from martinjevans.myopenid.com | November 22, 2011 9:09 AM

I believe it's good CPAN citizenship not to preclude people from running your tests in parallel, when possible. Sure, 20 seconds isn't too bad for you running your full test suite just before you commit a big change, but 20 seconds per CPAN distribution used as a dependency for some projects means minutes and hours.

If it's truly impossible to run certain tests in parallel due to serial access to a shared resource, some sort of locking strategy might help.

Leon Timmermans | November 22, 2011 9:37 AM

The most notable long test suite is for Perl itself - I wouldn't mind that one running more quickly.

«make test_harness TEST_JOBS=$NUM» works like a bliss. It finishes in under two minutes on my 8 core workstation ;-).

martinjevans.myopenid.com replied to comment from chromatic | November 22, 2011 10:03 AM

ok, it may be good CPAN citizenship to not preclude people from running tests in parallel but it is also potentially a lot of work to change an existing suite to work in parallel. In all the years I've been looking after modules I've never once had any one report they don't work in parallel. Plus:

DBD::ODBC is only a dependent of 3 other modules (according to http://deps.cpantesters.org/depended-on-by.pl?module=DBD%3A%3AODBC)

very few people (and almost certaining all smokers) actually run DBD test suites with an existing database and valid login in which case all tests are skipped.

as 2, anyone installing with cpan shell etc almost always just does "install DBD::whatever" and doesn't have DBI_DSN, DBI_USER, DBI_PASS set so here again, all tests are skipped.

So all I'm saying the effort is probably not worth the gain for some modules.

martinjevans.myopenid.com replied to comment from Leon Timmermans | November 22, 2011 10:05 AM

Thanks for that, now all I need to do is work out how to make perlbrew do that.

chromatic replied to comment from martinjevans.myopenid.com | November 22, 2011 12:02 PM

Only within the past couple of years has TAP::Harness reliably been able to run tests in parallel, has been available on enough systems that it matters, and have people been starting to take advantage of it. I don't take its relative unpopularity until know as anything more than the relative obscurity of its existence.

Agreed that scouring existing test suites for parallelism blockers doesn't always make sense in terms of effort, but I believe it's worth at least considering.

Aristotle Pagaltzis | November 22, 2011 6:43 PM

Making a test suite not break under parallelism doesn’t necessitate making it run in parallel.

However I believe something like say 95% of tests on CPAN will already run fine in parallel with no further ado, and of the rest, easily the majority will be very simple to fix.

In the quasi-infinitesimal remainder of cases, sure, if the effort is not worth it, just forcibly serialise the tests and move on.

I expect a push to test parallelism to require little housekeeping effort all told. There just needs to be a reliable pressure that steers the CPAN towards it.

Yes, chromatic, you are right that it is a little more involved to detect failures as errors in parallelism.

szabgab.com | November 27, 2011 9:03 AM

Some of my test also suffer exactly from that port number issue. Just one thought I had now: Use the number of the test file - if you have numbered them - in addition to the fixed port number. Then thoses should be unique.

Tags:

10 Comments

Modern Perl: The Book

Categories

Monthly Archives

Pages

About this Entry