Linode VPS Hosting - Starting at $19.95 per month

Transliteration

No votes yet
Created: 
Sun, 2007-12-09 06:57
Available Releases: 
5.x, 6.x, 7.x

Provides a central transliteration service to other Drupal modules, and
sanitizes file names while uploading.

Generally spoken, it takes Unicode text and tries to represent it in US-ASCII characters (universally displayable, unaccented characters) by attempting to transliterate the pronunciation expressed by the text in some other writing system to Roman letters.

According to Unidecode, from which most of the transliteration data has been derived, "Russian and Greek seem to work passably. But it works quite bad on Japanese and Thai."

Do I need to use transliteration for uploaded files?

This question can't be generally answered, rather it depends on what you want to do with user submitted file uploads. There are two simple cases when you might not need transliteration:

  1. you let users upload files to your site and offer these files as download without PHP processing, and you're on Drupal 6 or later, and not using a Windows-based web server. Or,
  2. you are sure your users won't upload files or images containing non-ASCII characters in file names.

However, whenever you want to process uploaded files on the server, you most likely need transliteration. For example, if you are using ImageCache to provide modified versions of uploaded images. The reason is that PHP 5 doesn't fully support Unicode characters in filenames, and may not be able to access those files.

Whether you use transliteration for URLs (when using Pathauto 2.x), however, is a matter of personal taste. For example, the russian Wikipedia does not transliterate, but uses full Unicode in URLs. On the other hand, as a user noted, links containing Unicode look quite ugly in e-mails sent from your site.

On Drupal 5, transliteration is required since Unicode characters in generated URLs (for example, file attachments) are not properly encoded in certain cases (#191116: Make drupal_urlencode RFC 1738-compliant).

Roadmap

3.x is now in beta!

  • New developer-friendly transliteration data file layout
  • Lower memory footprint of replacement function
  • Make filename cleaning optional.
  • Move retroactive filename cleaning to the backend.

If you would like to help make transliteration data better, the following sources might act as a starting point: CLDR — Unicode Common Locale Data Repository, especially the guidelines and available transliteration charts.

Credits

Authors:

  • Stefan M. Kudwien (smk-ka)
  • Daniel F. Kudwien (sun)

UTF-8 normalization is based on UtfNormal.php from MediaWiki and transliteration uses data from Sean M. Burke's Text::Unidecode module.

Sponsor:

UNLEASHED MIND
Specialized in consulting and development of Drupal powered sites, our services include installation, development, theming, customization, and hosting to get you started.

Drupal Servers is an Arbor Drupal Development project powered by Drupal CMSLinode, and a Ninja. Drupalservers.net is NOT an official Drupal website, and is NOT endorsed by Dries Buytaert or the Drupal Association. It's sole pupose is to foster the use of the Drupal CMS to it's visitors. This site is owned and operated by Jason Moore, an Individual Member of the Drupal Association. Drupal is a registered trademark of Dries Buytaert.

All content on this website is licensed by a Creative Commons Attribution-ShareAlike license v2.0 or greater unless otherwise noted.

Creative Commons Attribution-ShareAlike license v2.0