The Tyranny of Memory Part II (Reifying COWs)

If your language has a split header/buffer system to represent strings, and you support mutable strings, you probably have a copy-on-write system. Copy-on-write (or COW) strings help you avoid making copies of buffers until necessary.

Given a 90kb file containing the entire source code of a program, it's likely the compiler, parser, runtime, and everything else has many, many strings pointing to various parts of the program. If nothing ever writes to any of these strings, they can all share the same buffer. You need a separate string header for each substring, but you can get away with a single buffer.

Parrot (and by extension, Rakudo Perl 6) do this.

When you make a copy of a string, perhaps as a substring operation but also for some other reason, you allocate a new string header, but you copy the buffer pointer directly. Then you update a flag in the new string header indicating that any modifications to that string need to make their own copies of the buffer, rather than modifying it in place. This prevents you from modifying a buffer to which other string headers point.

This is all well and good. Unfortunately, there was a bug in Parrot—not just a typo, but a deliberate bug.

The code which performs the actual copy portion of COW in Parrot checked for the COW flag, looked at the contents of the string header, and then copied the entire buffer into a new buffer. If you have a 90kb buffer representing the entire source code of your program and you have several dozen strings each representing a token in the parser sense, and if you want to modify those tokens, Parrot would allocate another 90kb buffer for each string.

Worse, a comment in the code said "Let's copy the entire buffer."

That's obviously wrong behavior, but the right behavior isn't as simple as it seems. Obviously it's important to copy only the relevant substring of the buffer before making modifications. Yet when the specific encoding of the buffer isn't the simple one-character-per-byte you might expect if you've never worked with anything more complex than Latin-1, you have to be careful about blindly copying memory around. Sometimes bugs, even deliberate ones like this, paper over other problems elsewhere.

When Vasily and I fixed the encoding problem, memory use when bootstrapping Rakudo dropped by two thirds. Unfortunately, performance suffered dramatically—but now that it was possible to build Rakudo again on machines with less than 2GB of memory, we decided it was better to build slowly than not at all, at least until we found the performance culprit.

That's a story for next time. In the meantime, very clever readers will have deciphered the subtext in these entries and the title and have probably already figured out what went wrong and why.

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide

Categories

Pages

About this Entry

This page contains a single entry by chromatic published on April 5, 2010 1:01 PM.

The Tyranny of Memory Part I (Shared Buffers) was the previous entry in this blog.

The Tyranny of Memory Part III (Don't Copy that String) is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.


Powered by the Perl programming language

what is programming?