What if everything we thought about types in dynamic languages were wrong?
(Dear Internet forum readers, Thank you for reading all the way into the second paragraph. I shouldn't have to answer a rhetorical question with the obvious answer of "It isn't," but now you have both the answer and the moral authority to mock everyone with a narcissistic streak large enough and an attention span short enough to rush immediately to a favored Internet forum and post a lengthy "Look at how smart I am and how dumb someone else is!" screed. Have a nice afternoon!)
I've updated Test::MockObject, UNIVERSAL::isa, and UNIVERSAL::can recently. I've also spent a lot of time working with Moose-powered code to eliminate duplication, improve genericity and polymorphism, reduce the possibility of errors, and increase testability and test coverage.
Then I read the documentation for Data::Thunk and something interesting struck me.
I've long argued that checking types by name and mechanism in Perl 5 is problematic. So is giving up by pretending everything is a duck. The conflict between specifying your requirements too strictly and removing the possibilities for extension or genericity you can't imagine and being too lax and allowing the possibility for error is difficult to navigate.
Yet sometimes I think about this in the wrong way.
A thunk in the
Data::Thunk sense is a way to delay an expensive
calculation until you need it. (I quite like lazy synthetic
attributes in Moose for similar reasons.) While synthetic object attributes
are promises encapsulated in an object behind accessors—and you can pass
that object around without ever triggering the lazy generation until you need
it—a thunk exposed as a non-object, perhaps an array or an other
primitive first class value, doesn't have the same level of encapsulation.
In other words, it's far too easy to tell that
@immediate_values is different from
... except that perhaps we think about things incorrectly.
Suppose you want to generate a list of prime numbers for cryptographic purposes. The first few prime numbers are cryptographically worthless. The numbers get ever more expensive to calculate as you go on. Your code needs to find a balance between calculating too many or too few, but you don't necessarily know which pair will suffice for your purposes until you calculate them. (Also, you probably don't have to calculate all of the intermediate primes between 1 and 2 and n and m, if you have a good algorithm to pick a few potential primes and continue from there.)
I see three possibilities to represent the data structure containing this list of primes:
- A plain array
- An iterator or generator (whether with internal language support or an object)
- Something lazy as a combination of both
You might be fortunate enough to use a language such as Haskell with this laziness as a fundamental language feature. Good for you! You may use Python, in which case a generator expression might be the best approach. Hooray, I suppose. You may use Perl 5, and so you have plenty of options for syntax. That flexibility can be handy.
How much are you going to let your internal representation of a storage mechanism tie your hands from a design perspective?
If you choose the array, you've tied your code to a specific mechanism and a specific syntax. The same applies to an object or generator, though the object gives you slightly more options, in the polymorphic sense, to retain an interface but provide a different implementation. Even so, you can't swap a lazy array for an object without bridging the difference in interface, unless your language explicitly supports this.
Therein lies my question about type checking in dynamic languages.
I've been a good programmer. I rewarded myself with a couple of cookies for separating the concern of generating a list of random numbers from the code which uses that list of random numbers. It's easier to test, to maintain, to document, to maintain, to do everything I might need to do to it.
Yet the two pieces of code I've worked so hard to decouple in form are still tightly coupled via the types of a parameter used to communicate between them, because they both have a dependency on being some sort of array, some sort of generator or iterator, or some sort of object which provides a generator or iterator, when all I really want to be able to say is "These two separate pieces of my system communicate when one of them provides a promise to provide multiple prime numbers".
I used the word tied in a previous paragraph. If you caught the pun, great. (If you're not a Perl 5 programmer and you wonder about the pun, don't. It's really not that clever.) That approach is good in some ways and awful in others, because it does allow uniformity of interface (with an awkward and slow implementation) but it doesn't allow me to decorate the arrayish variable with the type information that "This thingie you can treat as an array is a promise to provide random numbers when you want them. Don't ask how. Leave efficiency concerns to the implementation. You worry about what you get out of it."
(One also sometimes wonders why Python simultaneously prides itself on a rigid orthogonality and parsimony of syntax such that toddlers often speak in valid Python programs before they learn the vagaries and inconsistencies of English while borrowing and uglifying generator and filter syntax from other languages when it would have been simpler to say that generators are specific enhancements and refinements of lists. Then again, I as an American from the frontier states lack certain clever Continental irony.)
In other other words, maybe us dynamic language get types so wrong because we're so caught up in thinking "What primitive does this resemble?" or "What global string name does this or any class in its inheritance hierarchy match?" that we too rarely stop to ask ourselves the important question of "What does this data represent?"
Would that our languages allowed us to express that meaning instead of merely the mechanism.