... or A Modest Proposal for Dynamic Language Bindings
I've worked on a few shared library bindings for various dynamic languages: several libraries for Parrot, a few for Perl 5, and one for Ruby. I've embedded Perl 5 and I've embedded Parrot. (I figured out how to get Perl 5's reference counting working correctly with Parrot's "true" GC and how to get Parrot's GC working when embedded in Ruby.)
I even wrote a proof of concept silly port of Parrot's foreign function interface to Perl 5 before the Python folks adopted the much better ctypes (and can't wait to use ctypes for Perl 5).
All of this reveals to me that there's something rotten about writing simple bindings to shared libraries from dynamic languages. It's mostly tedious, uninteresting work with far too many chances of bugs and far too many repetitive details. You'd think computers would be good at solving both problems.
I have generalized from my psyche-scarring experiences two fundamental assumptions:
- C (and specifically C headers) are a terrible layer of interoperability because they cannot express some of the most important details (Does this function acquire a shared resource? Whose responsibility is it to manage the lifespan of that resource?) and they obscure the clarity of intent through the use of abstractions such as C declarations and macros.
- Requiring end users to install a full development environment along with the development headers for any library to which they want to install your bindings is a recipe for madness on the part of installers and soul-crushing despair on your part, as you try to figure out precisely which version of OpenGL is available on which version of Windows with which specific release of a given video card and oh goodness no, please do not tell me you just upgraded your Cygwin.
In other words, parsing headers at the configuration time of a CPAN module which binds to, for example, libcurl, is madness, and we should stop.
Assume that ctypes for Perl 5 exists very soon in a form in which you can rely on its presence on a modern Perl 5 installation. Assume that if you prefer Python or Lua or Ruby or Haskell or Factor or even some form of Common Lisp not tightly bound to the JVM or the CLR that you have a similar library which knows how to translate from your language's calling conventions to the C ABI to which the library's exported functions conform and that the type mapping problem is solved for 80% of the cases.
Now you need some mechanism to identify the symbols exported from the shared library to generate the appropriate thunks.
I've tried (and failed) to use Swig, and I blame myself more than anyone else for that—but Swig is the wrong answer. Parsing C headers is the wrong answer in 2010 and it was the wrong answer 20 years ago. C headers do not provide the right information in the right form. Effectively you have to have a bug-free C preprocessor to expand headers into literal C source code and then hope that your C parser will identify the correct information you need.
What's the right level of abstraction? That depends on the information a thunk library such as ctypes needs to know:
- The name of an exported symbol
- Its input and output types (and in specific, bit width, signedness, any varargs)
- Constness of pointers or expected modification of out parameters
- Exceptional conditions such as control flow modifications through
- Error handling, such as setting
errnoor special return values
- Resource handling, such as a function which returns a
malloced value but expects you to
freeit yourself (or some combination)
... and probably more.
I've set aside the concept of opaque pointers versus raw structs, because that's another rathole full of platform-specific concerns (and besides that, any library which does not expose only opaque pointers to external uses is in a state of denial of reality and deserves a very good refactoring), but you probably already get the idea.
Wouldn't it be nice if shared libraries could provide some sort of machine-parseable, semantics-preserving, declarative (that is, no cpp necessary!) file which all of us poor users could parse once with our thunk generators to produce bare-bones, no sugar added interfaces to these wonderful libraries, then get on with the interesting work such as building Pygame and SDL_perl in wonderfully Pythonic and Perlish ways instead of manually reading SDL_video.h and figuring out how to map all of that implicit information into XS ourselves?
I'm not asking for another section crammed into ELF files and I'm not suggesting that the fine people behind libxslt need to compile a manual file of machine-extractable information themselves—if we had a nice format all of our dynamic languages could understand, anyone could make this file once for any API/ABI version of the library and we could all share for a change. Wouldn't that be nice?
(and yes, I'm aware of CORBA and COM and their IDLs, but the existence of Monopoly money by no means renders a $20 useless at the grocery store)