String-Plus

| 9 Comments

What does this variable represent?

my $thingie =<<'END';
Thaddeus Droit
4616 NW Washington Place
Beaverton, OR 97006
END

It's obviously an address, but what does Perl know about it? Perl knows it's a string. Perl knows it's some 60 characters long. Perl may even know that it's a valid string of Latin-1 characters.

Perl doesn't know where the string came from, nor that it contains a street address or a legal name nor a zip code (and not a zip + 4). Any meaning to the program beyond "It's a string of some 60 characters and is valid in the Latin-1 encoding" is far beyond what Perl knows about it. That's why the name of the variable is $thingie; even though Perl doesn't care about variable names, calling it $address instead could have led you to believe there's more structural meaning to this chunk of memory than actually exists.

Names are important, at least to people maintaining source code. This code is obviously wrong:

$user->set_address( $birthday );

... but to Perl it might as well be:

$foo->bar( $baz );

... for all of the semantic meaning it understands. There's no obvious intent.

I know you're smart and you're way ahead of me and you think "If I wanted a good static type system, I know where to find Haskell or OCaml and I'd never let code that bad get out of code review and why aren't you writing tests." but that's not the point. You can be super careful or make APIs which restrict the most natural way to write code in the host language in favor of extra security. That may be the right approach. (You have to be careful, though: the ease of interpolating untrusted user input into a raw string or the use of register globals in PHP seems analogous to the attractive nuisance doctrine, where people who don't know any better can't analyze the risk appropriately.

There may be another way.

Suppose I annotated the address:

my Address $thingie =<<'END';
Thaddeus Droit
4616 NW Washington Place
Beaverton, OR 97006
END

It's still a chunk of memory with certain characteristics, but now it has an extra piece of metadata related to the program itself (and not merely Perl itself). A clever compiler could detect certain places where the semantics of an operation don't match:

method set_address(Address $addy) { ... }

... though you do have to be able to resolve this kind of dispatch at compilation time to prove the type safety of the entire program at compilation time. (I've seen suggestions that even Smalltalk programs can resolve some 85-90% of dispatch targets in a static fashion.)

You don't have to go that far; runtime verification with a good test suite is effectve, can be fairly cheap, and is available right now in Perl 5 with Moose.

There's still another way. Consider again the untrusted input example. If you enable tainting, you might read user input into an address:

my Address $untrusted_addy = $req->get( 'address' );

You don't see it in the declaration, but the "This is tainted!" metadata is present in $untrusted_addy. How do you deal with that?

You could be picky about always untainting untrusted data, but can you do that accurately and effectively? Can you rely on everyone always getting it right?

What if you could write:

SQL {{
    UPDATE users SET address = { Address $address } WHERE user = { User $user }
}}

... and Perl could verify that $address is an appropriate Address (and $user is an appropriate User), could quote and escape and validate both of them effectively, could extract the primary key from $user, and could untaint any tainted $address or $user?

If your language supports multiple dispatch, lets you define your own types, lets you override stringification, and can override interpolation for cases like these, you can do such things.

In other words, you could turn what would otherwise be a raw string into an embedded little language with its own syntax and semantics, interoperate with native data structures in the host language, and provide composable safety—and users don't have to know much of anything about how this works, as it pretty much does what they expect.

I can imagine a language like that.

9 Comments

I shouldn't be reading this stuff, sometimes over my head, but I take it that you are saying;

Syntax like my Address $thingie =

doesn't exist, but would be nice to have. I am going to have get into Moose.

Anyway thanks for your blog which I have been following via a newsfeed for a year or so.


All the best


Owen

Perl 5 can parse that syntax right now (it's existed for ages), but Perl 5 doesn't do much with it. The semantics I want aren't yet a part of Perl 5.

I think that a strong dynamic type system could be a really interesting addition to Perl5. The power that you get out of it in Moose is highly underrated. In fact you came up in a discussion at the bar during YAPC about why I think splitting the TypeConstraints out of Moose would be interesting, for roughly the reasons you argue here about SmallTalk.

Stevan argues with me that the my Dog $boo syntax doesn't do anything in Perl5. I argue that it currently doesn't need to. It's important enough to do some iterations and allow people like yourself who know the guts better than Stevan or I have a go at making that more useful over the long term.

It's unfortunate but TypeConstraints are the second most under-developed feature in Moose core currently (the first is Exceptions, but we have a solid plan there, and possibly volunteers working on it). MooseX::Types is nice, but has odd interactions with barewords and people's expectations (the "DateTime::" problem).

Also I'm not sure if you've seen typesafety or MooseX::Lexical::Types. Both of which explore this idea somewhat, but both have issues since the supports underneath them are (I've been told) anemic.

I'm unlikely to write a Perl 5 optimizer or type checker, though such a thing would be a useful addition to Perl 5. In particular, unless there's a way to analyze or modify Perl 5 programs as abstract trees of one form or another, it's a lot of busy work.

Perl 6 is another matter, especially as it has pervasive multiple dispatch even throughout its builtins.

SQL is a great example for this. Relational databases are more useful with strong typing, so EMPLOYEE_ID is incompatible with PRODUCT_ID even if they are both implemented as INT. It'd be a great idea to see those constraints implemented at the perl level, presumably by giving perl more knowledge of the database schema than even the database engine has.

I wanted to ask if it is what Readonly module uses, but I see it is

Readonly my $a = 10;
and not

my Readonly $a = 10;

Readonly is actually a function that takes a list of arguments. This is why a comma (i.e. ',' or '=>') is required instead of an equal sign (i.e. '='). The fat comma (i.e. '=>') is the form mentioned in the documentation.

The first argument is the variable that you want to make readonly.

The second argument is the value that you want it set to.


The following are all equivalent:

Readonly my $a => 10;

Readonly( my $a => 10);

Readonly(my $a, 10);

my $a;
Readonly ($a, 10);


Hope this helps.

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

affiliated with ModernPerl.net

Categories

Pages

About this Entry

This page contains a single entry by chromatic published on July 16, 2010 10:59 AM.

Strings and Security and Designing Away Bugs was the previous entry in this blog.

Eliminating Errors with Little Languages is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.


Sponsored by Blender Recipe Reviews and the Trendshare how to invest guide

Powered by the Perl programming language

what is programming?