Server config: Mistakes with character encoding - part 2

posted: March 10th, 2007 · by: Sven

in: Programming, Globalization · tagged as: , , , , , , , ·  10 comments »

So, you know how to pipe data from the webbrowser through various library layers to a database and all the way back again. You know how to configure every layer of all those sophisticated web applications, frameworks, libraries, programming languages, ...

But now here’s this dammned static file that seems to get totally screwed somewhere and you’re already starting to pull your hair out because there’s no apparent reason.

Relax and step back. Look again. Sometimes things are simple, that simple that one doesn’t see the wood for the trees.

Read the rest of this entry

Common mistakes with character encodings - part 1

posted: February 14th, 2007 · by: Sven

in: Programming, Globalization · tagged as: , , , , , , , ·  5 comments »

Ok, I’m going to collect some gotchas, pitfalls, mistakes and traps relating to Unicode, UTF-8, character encoding in general etc. Hopefully this prevents myself and others from being bitten (again) or at least might help to find the culprid more easily.

So if you have any additions here: please let me know! You read that? If you’ve encountered some kind of common problem with character encodings, please, let me know! There’s a comment form to use below and you also can always send me a mail. Thanks in advance :)

Let’s start with some basic stuff …

Read the rest of this entry

Getting MySQL compare Unicode Greek Extended characters correctly

posted: February 8th, 2007 · by: Sven

in: Programming, Globalization · tagged as: , , , , , ·  10 comments »

Lately I ran into an interesting issue with MySQL’s string comparsion that I haven’t seen before.

I’ve been setting up a simple vocabulary and grammar learning program for my spouse who’s started learning ancient greek a while ago. After she’s entered some testdata containing several funny looking ancient greek characters we saw that MySQL 4.1 seems to treat the following characters as equal when compared as VARCHAR:

Char. Unicode
Codepos.
UTF-8 Name
eta U+03B7 206 183 eta
eta with oxia U+1F75 225 189 181 eta w/ oxia
eta with persispomeni and ypogegrammenti U+1FC4 225 191 135 eta w/ persispomeni and ypogegrammenti

These characters are stored and retrieved correctly (which was a nice thing to watch, by the way). But when it comes to compare them to each other they are wrongly regarded the same character.

Read the rest of this entry

artweb design
Sven Fuchs
Grünberger Str. 65
10245 Berlin, Germany


http://www.artweb-design.de

Fon +49 (30) 47 98 69 96
Fax +49 (30) 47 98 69 97