Managing Unicode properly isn't exactly easy even in 2011.
Perl 5.14 makes Unicode somewhat easier with the optional
unicode_strings feature, but you have to enable it explicitly, and you can only handle external data correctly if you know the intent of that external data.
(One of the small details I like in the book Gravitas, published by my company, is the documentation of the main character's struggles in one chapter with a Unicode bug exacerbated by one too many assumptions about characters versus bytes in his project's ORM. Art imitates life as satire.)
Tom Christiansen's Why Does Modern Perl avoid UTF-8 By Default missive is classic tchrist—clever, articulate, detailed, and a wave of text which crashes over the unsuspecting like a sneaker wave with a sinister undertow. If you're not careful, it'll lead you in a direction you never suspected.
You can see this when a smart person such as Nelson Minear claims that "Perl 5 can't handle Unicode properly". Aristotle caught his attention and Nelson offered a respectful retraction...
... but be careful not to miss the main point.
Handling Unicode appropriately is difficult, even in 2011. Many of Tom's very valid points are repeated reinforcements of the notion that your software, my software, everyone's software makes several assumptions about what incoming and outgoing data means. When those assumptions are wrong, you get bugs.
If 14 May 2010—the release date of Perl 5.14—had been Perl 5's
Unicode flag day, such that
perl assumed that all incoming data
and all outgoing data were Unicode unless explicitly marked otherwise, Perl 5
programmers and users alike would discover exactly how many assumptions we've
made. Some of them we can fix easily. Some of them we can't. Some of them
require further fixes to the Perl 5 core itself, and some of them require
operating system vendors and distributors to fix their own software.
This job isn't easy and it won't be quick.
I'm all for making progress and for making painful changes to improve the present and future for preset and future programmers, but the benefits have to outweigh the costs. Right now, they don't. Hopefully that day will come soon.
If you would like to enable UTF-8 everywhere in your Perl 5 programs, see Mike Doherty's utf8::all.