The Tyranny of Memory Part I (Shared Buffers)

At last month's Portland Perl Mongers, the performance discussion came up. "Isn't it faster to write the bottlenecks of your application in C/XS?" someone asked.

Therein lies a pervasive myth of dynamic languages. It's not always faster to write in C. In my experience contributing to Parrot, the more data you pass back and forth between your high level language and C, the slower things get. That is to say, reducing memory usage is as important to performance as anything else. (This assumes you've chosen the right data structures and intelligent algorithms.)

This came up recently, when Vasily Chekalkin and I committed two large performance improvements for Parrot visible in Rakudo Perl 6. Compiling the bootstrapped portion of Rakudo used steadily more and more memory. On my laptop, it topped out at 1.5 GB. Clearly this was too much.

When we fixed that problem, it compiled in 250 MB, but it took four or five times longer to compile. We fixed that problem too, and in so doing demonstrated that the effective use of memory is as important to performance as almost anything else.

First, you have to understand how strings work in Parrot.

Shared Buffers

A string in Parrot is two data structures. One of them is the string header, which contains information about the string's character set, its encoding, its length, and some flags which track constantness and copy on write information. The other data structure is a buffer, which represents a contiguous chunk of data. A string header points to a buffer—actually a location within a buffer, as the header points to the starting point of the string within the buffer and contains string length information.

You've probably already figured out that multiple string headers can share the same buffer. Buffers have reference counts so that garbage collection works properly. (You don't have to use a reference counting scheme, but it's much easier to manage this appropriately for a small system like this, where only a few places need to update reference counts and where precise destruction is useful.)

Sharing buffers tends to mean using less memory overall. It makes taking substrings cheap, which is very useful when parsing large documents, such as the Perl 6 bootstrapping source code.

If your system also supports mutable strings, you can also perform copy on write (COW), where multiple string headers can point to the same buffer and the appropriate contents of the buffer get copied to a new buffer only when you modify a string in place.

I wrote that paragraph correctly. That's not what Parrot did, which is why building Rakudo used so much memory. That was the source of the first bug that Vasily and I fixed, and it inadvertently hid the second bug that fixing the first bug exposed.

Several Parrot hackers besides myself have come to the conclusion that Parrot should consider using immutable strings instead of mutable strings. That solves other problems.

I'll write more about the two bugs we fixed next week, as well as what we hope to gain with immutable strings. In the meantime, very careful readers can amuse themselves by speculating about what Parrot did, why it was wrong, and why the second bug was so annoying.

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide

Categories

Pages

About this Entry

This page contains a single entry by chromatic published on April 2, 2010 2:09 PM.

Perl 4, Back Where It Belongs was the previous entry in this blog.

The Tyranny of Memory Part II (Reifying COWs) is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.


Powered by the Perl programming language

what is programming?