2011-11-17

Removing diacritics from a string

Recently I had to write a function that removes all diacritics from a string (e.g.: turning José into Jose). Searching the web, I quickly found the blog post “Stripping is an interesting job” by Michael Kaplan. His code is simple and good, but I saw some opportunities for optimizations (obvious stuff): because we know the approximate length (actually the maximum length) of the resulting string, we could give the StringBuilder-instance an initial capacity equal to the length of the original string. In some simple cases I actually like using a char-array instead of a StringBuilder, because it has even less overhead. Another obvious optimization is to check whether the original string is not empty. Here's my optimized version: