Reinventing the Axle

| 4 Comments

... or A Modest Proposal for Dynamic Language Bindings

I've worked on a few shared library bindings for various dynamic languages: several libraries for Parrot, a few for Perl 5, and one for Ruby. I've embedded Perl 5 and I've embedded Parrot. (I figured out how to get Perl 5's reference counting working correctly with Parrot's "true" GC and how to get Parrot's GC working when embedded in Ruby.)

I even wrote a proof of concept silly port of Parrot's foreign function interface to Perl 5 before the Python folks adopted the much better ctypes (and can't wait to use ctypes for Perl 5).

All of this reveals to me that there's something rotten about writing simple bindings to shared libraries from dynamic languages. It's mostly tedious, uninteresting work with far too many chances of bugs and far too many repetitive details. You'd think computers would be good at solving both problems.

I have generalized from my psyche-scarring experiences two fundamental assumptions:

  • C (and specifically C headers) are a terrible layer of interoperability because they cannot express some of the most important details (Does this function acquire a shared resource? Whose responsibility is it to manage the lifespan of that resource?) and they obscure the clarity of intent through the use of abstractions such as C declarations and macros.
  • Requiring end users to install a full development environment along with the development headers for any library to which they want to install your bindings is a recipe for madness on the part of installers and soul-crushing despair on your part, as you try to figure out precisely which version of OpenGL is available on which version of Windows with which specific release of a given video card and oh goodness no, please do not tell me you just upgraded your Cygwin.

In other words, parsing headers at the configuration time of a CPAN module which binds to, for example, libcurl, is madness, and we should stop.

Assume that ctypes for Perl 5 exists very soon in a form in which you can rely on its presence on a modern Perl 5 installation. Assume that if you prefer Python or Lua or Ruby or Haskell or Factor or even some form of Common Lisp not tightly bound to the JVM or the CLR that you have a similar library which knows how to translate from your language's calling conventions to the C ABI to which the library's exported functions conform and that the type mapping problem is solved for 80% of the cases.

Now you need some mechanism to identify the symbols exported from the shared library to generate the appropriate thunks.

I've tried (and failed) to use Swig, and I blame myself more than anyone else for that—but Swig is the wrong answer. Parsing C headers is the wrong answer in 2010 and it was the wrong answer 20 years ago. C headers do not provide the right information in the right form. Effectively you have to have a bug-free C preprocessor to expand headers into literal C source code and then hope that your C parser will identify the correct information you need.

What's the right level of abstraction? That depends on the information a thunk library such as ctypes needs to know:

  • The name of an exported symbol
  • Its input and output types (and in specific, bit width, signedness, any varargs)
  • Constness of pointers or expected modification of out parameters
  • Exceptional conditions such as control flow modifications through longjmp
  • Error handling, such as setting errno or special return values
  • Resource handling, such as a function which returns a malloced value but expects you to free it yourself (or some combination)

... and probably more.

I've set aside the concept of opaque pointers versus raw structs, because that's another rathole full of platform-specific concerns (and besides that, any library which does not expose only opaque pointers to external uses is in a state of denial of reality and deserves a very good refactoring), but you probably already get the idea.

Wouldn't it be nice if shared libraries could provide some sort of machine-parseable, semantics-preserving, declarative (that is, no cpp necessary!) file which all of us poor users could parse once with our thunk generators to produce bare-bones, no sugar added interfaces to these wonderful libraries, then get on with the interesting work such as building Pygame and SDL_perl in wonderfully Pythonic and Perlish ways instead of manually reading SDL_video.h and figuring out how to map all of that implicit information into XS ourselves?

I'm not asking for another section crammed into ELF files and I'm not suggesting that the fine people behind libxslt need to compile a manual file of machine-extractable information themselves—if we had a nice format all of our dynamic languages could understand, anyone could make this file once for any API/ABI version of the library and we could all share for a change. Wouldn't that be nice?

(and yes, I'm aware of CORBA and COM and their IDLs, but the existence of Monopoly money by no means renders a $20 useless at the grocery store)

4 Comments

This isn't just an issue for dynamic languages. The problem still exists in more traditional, static languages. Sure, it's been mitigated, but only because most people uses C as a binding language. And even that is not assured if you have compilers that don't agree on the fundamentals. It's fragile and only works because everyone involved trusts that everyone else will agree on the same set of rules. But if one cog is misaligned then it all comes undone.

Yes, this is an issue that has bugged me for years because I have preferred languages other than C and C++ and I wish there was a solution in sight.

I've done some experimenting with this as well, and the closest thing that I've found to what I needed was gcc-xml. But that's even further from a solution.

I'd very, very much ++ an abstract standard that would allow things like inflating Class::MOP/Moose hierarchies from a C++ library spec. It would also go a long way towards supporting multiple library versions with one distribution.

Trying to extract this information from an existing compiler's immediate representation is better than parsing C headers, but it's still a mess. My own epiphany came from realizing that the most important information--resource lifetimes and management responsibilities--is never present in any structured form in the headers, whereas an IDL could have an explicit "This function returns malloced memory you need to free by calling that other function."

Thanks chromatic.

This is an issue I've often complained about, though usually from a perspective of trying to wrap kernel-level constructs, such as socket options or ioctls. Those are far simpler, only needing to know constant numbers and structure layouts, needing nothing new by way of function calls, or functional semantics. Even then they're quite hard.

My solution there was to build ExtUtils::H2PM which specifically does not attempt to parse C files; it instead invokes the C compiler to do the parsing for it, and report via a program it compiles and runs. It's a horrible hideous hack, but it's also the neatest and in fact only way I could think of to implement such behaviour.

If anyone manages to come up with a neater way even just to represent this, I'd love it. Fact is, I don't think any Linux (or any other OS) kernel developer would be interested. "C is king", no? I don't honestly see any kernel developer ever writing interface metadata in any language other than C. If you can't even do it for the kernel, what hope for any sort of userland library, which has all the complication of function calls added on top.

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide

Categories

Pages

About this Entry

This page contains a single entry by chromatic published on October 25, 2010 11:58 AM.

Closures, Late Binding, and Abstractions was the previous entry in this blog.

How not to Handle Exceptions is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.


Powered by the Perl programming language

what is programming?