Why Perl 5 Needs an AST


When I wrote Features Perl 5 Needs in 2012, I promised to explain my thinking. I've explained Why Perl 5 Needs a Metaobject Protocol and previously Why Perl 5 Needs a Compiler-Free Extension Mechanism.

(If you don't follow the latter link, ask yourself this: would you use Perl 5 on another VM if you didn't have access to the CPAN? As long as any CPAN stack you need includes an XS dependency, you're stuck with the C implementation of Perl 5. Break that dependency and... you see where this is going.)

If you read a good book on compilers such as SICP or the Dragon book or even a good book on Lisp, which is half of the same thing, you'll eventually run into the idea that you can represent any program in a tree-like structure.

That is to say, any type of computation you wish to perform can be represented with the right data structure, and that data structure happens to be a tree. Even the venerable "Hello, world!" program from K&R is a tree:

    / \
print  exit
  |      |
"Hello,  0

(You can represent this tree in a lot of ways, but you get the idea.) As SICP explains, the execution model governs how you traverse this tree. Obviously in this tree, you start at the topmost node, then evaluate leftmost depthfirst until you end.

Even though your processor likely doesn't execute programs by traversing trees and your language's favorite runtime doesn't either, tree structures are still very useful in compilers because they're simple data structures that are easy to traverse and produce and manipulate.

This is one reason that fans of Lisp think that Lisp is the ultimate programming language: you write it by writing this tree directly as source code and you can manipulate that tree with source code as if it were a tree, because it is.

Many of the rest of us don't want to spend the rest of our lives writing trees by hand, but sometimes we do want to do things with source code without having to write our own compilers, or at least our own parsers.

Do you know know people say "Parsing Perl is Hard?" That's because it is. It's not impossible, but the Perl 5 parser is a complicated program that mixes up the traditional roles of lexing and parsing such that replicating that parse completely is difficult, at best. This is why writing a syntax highlighter for Perl 5 is more difficult than writing one for Lisp.

Do you know how something like Devel::Declare works? I do, and I don't want to explain it to you. It's not particularly yucky magic, but it's not particularly lovely magic either.

Do you know how Perl 5's optimizer works? It's very limited.

Do you know how Perl 5 source filters work? Not that well! That's why the Switch module was deprecated in the commit after it became a core module.

Do you know how PPI works? Most of the time, very well, but with lots of magic and a few very well understood edge cases and not a lot of speed.

All of these problems have the same root cause: it's exceedingly difficult to manipulate a Perl 5 program as anything other than text, because the one thing that unambiguously understands that program won't share its understanding.

(That's not entirely true; a few years ago, Larry added a compile-time option to Perl 5 to produce an annotated tree from the parser/lexer/compiler, but no one's done much with it.)

A traditional compiler parses a document into a tree structure, then manipulates that tree into at least one and probably more trees and finally emits code in another language. It's very patriotic (from tree to shining tree). This pattern is no accident; it's the basis of many programs (everything is a compiler).

This process is so well understood that patterns exist for treating this process as a pipeline of tree transformations. As long as you know what kind of tree you're going to get and what kind of tree you need to produce, you can add your own transformation step. (If that sounds like Plack middleware, there's no coincidence there either.)

A formalized AST in Perl 5, representing an official separation between the lexing/parsing and execution phases of Perl 5, made available to any and all programs would let people write better syntax highlighters, sure, but also better IDEs (finding function declarations and associating them with source code would be easier) or debuggers (see again "finding things) or optimizers (a pipeline of tree transformations) or transliterations to other VMs or languages (still not entirely easy but much more possible) or serialization mechanisms (avoid parsing; dump an AST to the execution engine) or even little languages atop Perl with their own parsers which compile to the Perl 5 AST and run as if they had been Perl all along.

Think about that last point for a moment.

(Think about that last point in the context of syntax weirding mechanisms like MooseX::Declare, or anything which uses string eval.)

Are you sold yet?

Here's the bad news: this is a lot of work. It needs expertise and it needs research and planning. It needs at least one champion, and it probably needs funding. The first approach will probably fail. It will take longer than anyone wants. The only way it will happen is if someone stubborn appears and says "I'll do that!" or "I'll fund that!" and gets just enough support to keep going past the difficult parts.

Imagine how amazing it will be, though.


A convincing argument! I look forward to this.

As you will know, perl5 already has an AST. The optree. It can even be fully transformed back to equivalent source code via B::Deparse.

The problem is more that the AST is not documented at all, and changes are left over to CPAN, whilst the modules which did those analysis and optimizations and transformations were not updated with the big 5.10 optree rewrite. op_seq was thrown away, optree changes are now actively blocked at run-time, even without threads, but most signatures parsers e.g. do only work at run-time.

Any attempt to document the optree was met with silence.

Any attempt to advance compile-time optree optimizations with normal optimization techniques, like function analysis for our slow parts (exceptions? locals, lexicals?, return context and types?), function inlining, method resolution, method inlining, tailcalls, types, constant folding are either ridiculed, blocked or ignored. Or left to CPAN which would slow it down instead of doing optimizations.

The language is also by far not advanced enough to allow this, and discussions are not possible.

There are no proper constants yet (besides the syntax only CONSTSUB), lexicals const, const classes, const @ISA, everything is allowed at run-time, and no attempts are made to improve the situation.
Signatures and a new class syntax would solve a lot, but the current proposals ignore those possibilities.

I'd like to implement all that, but I'm waiting for unmade decisions. And I see no way that decisions are made amongst the uninformed. The deciders don't even listen to perl6, who solved these problems already some time ago.

The difference between an abstract syntax tree and the optree is twofold: the optree is neither abstract nor concerned with syntax. Another problem is that it's not really possible to create something to hand to the Perl 5 runtime without copying and pasting and modifying big chunks of the Perl 5 parser. While you might argue in a very technically precise sense that it exists and is of a nature with an AST, it's not usable in the sort of way someone might want to use an AST.

(I believe any decent AST for Perl 5 would give you much more stability than the optree does.)

As for Perl 6 solving these problems and more, I'm reserving judgment on the quality of its implementation design decisions until an implementation is usable for my purposes.

I would really, really love to have this, but I'm well aware of how difficult it is. Of your list of 5 things, this is easily the most difficult. I'm not seeing this happening any time soon, sadly.

Agreed. Structured exceptions are far easier, and probably more broadly useful at the moment.

flavio's perlito already creates a perl5 AST, and larry's P5Grammar is in work. Personally I prefer P5Grammar over the optree over perlito's AST, but perlito has nice emitters and is easy perl5.

The perl5 optree is very problematic to optimize because it is not in standard AST format :) And a hack. But it is very performant.
And I doubt that P5Grammar and perlito can be made faster than the optree. There are very expressive and high-level.

A new low-level parser and ast optimizers would be definitely a good idea. I was toying with the idea of a new perl5 vm or nqp backend (called p2), based on vmkit (which is based on LLVM and MMTk). Not parrot, as it is not performant enough. But any fresh start seems to be better. Or nqp.

At OSCON 2012 I got to hear Rob Pike talk about the go fix utility they wrote for the Go language. It takes their AST and examines it for old, deprecated or bug-fixed core code and *automatically* replaces it with up-to-date, fixed NEW code.

I was sitting in that session with my mind completely blown. *kaboom*

If we ever had something like that in Perl, people's heads would spin around. Obviously, Go doesn't have 20+ years of legacy to deal with, but an AST would be really amazing.

Java's had something like that for years, which isn't a surprise because so many of the great Smalltalk implementers and maintainers switched to writing Java tools.

Perl::Critic is a great tool, but it could be so much more.

Any Parrot or Perl 5 rewrite that has more than a thousand lines of C won't perform to your or my expectations. (Maybe two thousand.)

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide



About this Entry

This page contains a single entry by chromatic published on September 14, 2012 8:22 AM.

Why Perl 5 Needs a Metaobject Protocol was the previous entry in this blog.

Why Perl 5 Needs Compact, Natively Typed Data is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Powered by the Perl programming language

what is programming?