Mojolicious Unicode Normalization Plugin Released

To use Unicode effectively, you have to learn a lot more than the difference between 7-bit ASCII and UTF-8. For example, did you know that you can represent the same glyphs in multiple ways? That's right; multiple combinations of codepoints can produce the same glyphs.

If you're doing something interesting with user input such as comparing two strings or searching for one string in another, you probably want those strings to use the same canonical representation of codepoints. (You'd hate for users to file bugs that they can't find whatever they're searching for when what they're looking for looks correct, but they typed it a different way than you did.)<.p>

This is why Unicode Normalization Forms and Unicode::Normalize exist.

That's why I just released Mojolicious::Plugin::UnicodeNormalize. When it's active, it silently normalizes all incoming data to a single normalization form (in our case, NFC worked the best). It doesn't mess with uploaded files. It silently does the job, and it imposes only a tiny penalty.

It's been in use in a client application for almost a year and it helped us avoid countless bugs. Now you can use it too.

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide

Categories

Pages

About this Entry

This page contains a single entry by chromatic published on November 8, 2013 6:00 AM.

Would You Miss Autoderef in 5.20? was the previous entry in this blog.

Context and the Comma Operator is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.


Powered by the Perl programming language

what is programming?