Mojolicious Unicode Normalization Plugin Released

To use Unicode effectively, you have to learn a lot more than the difference between 7-bit ASCII and UTF-8. For example, did you know that you can represent the same glyphs in multiple ways? That's right; multiple combinations of codepoints can produce the same glyphs.

If you're doing something interesting with user input such as comparing two strings or searching for one string in another, you probably want those strings to use the same canonical representation of codepoints. (You'd hate for users to file bugs that they can't find whatever they're searching for when what they're looking for looks correct, but they typed it a different way than you did.)<.p>

This is why Unicode Normalization Forms and Unicode::Normalize exist.

That's why I just released Mojolicious::Plugin::UnicodeNormalize. When it's active, it silently normalizes all incoming data to a single normalization form (in our case, NFC worked the best). It doesn't mess with uploaded files. It silently does the job, and it imposes only a tiny penalty.

It's been in use in a client application for almost a year and it helped us avoid countless bugs. Now you can use it too.

Mojolicious Unicode Normalization Plugin Released

Tags:

Modern Perl: The Book

Categories

Monthly Archives

Pages

About this Entry