Getting MySQL compare Unicode Greek Extended characters correctly
posted: February 8th, 2007 · by: Sven
Lately I ran into an interesting issue with MySQL’s string comparsion that I haven’t seen before.
I’ve been setting up a simple vocabulary and grammar learning program for my spouse who’s started learning ancient greek a while ago. After she’s entered some testdata containing several funny looking ancient greek characters we saw that MySQL 4.1 seems to treat the following characters as equal when compared as VARCHAR:
| Char. | Unicode Codepos. |
UTF-8 | Name |
![]() |
U+03B7 | 206 183 | eta |
![]() |
U+1F75 | 225 189 181 | eta w/ oxia |
![]() |
U+1FC4 | 225 191 135 | eta w/ persispomeni and ypogegrammenti |
These characters are stored and retrieved correctly (which was a nice thing to watch, by the way). But when it comes to compare them to each other they are wrongly regarded the same character.
For example: We have a table named forms that contains a column named value. There are three records that contain the following values:
,
,
.
Now, each of the following statements …
SELECT * FROM forms WHERE value = '
'
SELECT * FROM forms WHERE value = '
'
SELECT * FROM forms WHERE value = '
'
… will select all of these three records. Got that? All of these statements select the same three records. Woops. That’s obviously not what one would expect here.
For the record, the database’s character-set is set to UTF-8 Unicode (utf8) and the table’s and VARCHAR column’s collation is set utf8_unicode_ci.
So what the heck is going on here?
After I’ve posted this question to the MySQL users mailinglist Chris White put me on the right track by pointing me to the LIKE BINARY keyword which enables you to do case-sensitive string comparsions on byte-level rather than character level.
As far as I understand what’s happening here MySQL transforms my VARCHAR columns for string comparsions. As of MySQL 4.1 a collation can be specified therefor. That’s the reason why MySQL’s string comparsions will be case-insensitive by default.
This transformation of course needs to rely on some mechanism to determine that, for example, “a” and “A” represent the same character. Now, utf8_unicode_ci seems to be only able to digest a certain range of UTF-8 characters and treats the both latter characters (eta with different accents) as the same as the first one (eta).
Bonus question now being why exactly this happens. I suspect I’ve just not collected enough Unicode fu to master this lesson, but the solution must be hidden somewhere there. If you’re able to answer this and found this article I’m sure you’ll also find your way to my “add a comment” form! :-)
Having learned something new I was pretty excited and went ahead to change some queries in my application, like:
Form.find_all_by_value value
… to use the newly discovered MySQL operator:
Form.find_all ["value LIKE BINARY ?", value]
… which worked perfectly. There’s been another query that also went wild in the same kind of way - this time using a DISTINCT selection:
Form.find(:all, :select => "DISTINCT value")
This had the same issues like the orignal query above in that it regarded all three characters as identical and, thus, in this case selected only one of them. BINARY to the rescue - one can use this keyword here also:
Form.find(:all, :select => "DISTINCT BINARY value as value")
… which also yielded the expected results.
I already thought about creating a mini-plugin to switch ActiveRecords to use something like this by default (further fueled by having seen a Django commit concerning something like this).
But that’s not even necessary at all.
Things are even far simpler.
Looking through the MySQL manual for more information on the LIKE BINARY operator I finally found that there’s actually a special column type that makes MySQL treat comparsions in exactly this same way:
Just use VARBINARY as your column type and you’re ready to go a compare exotic funny characters with MySQL.
I were quite perplexed because I never knew about this entire field type and for years on now stupidly and blindly used VARCHAR columns.
At the end of the day now the question remains what limitations, funny behaviours or other things to consider there are with VARBINARY columns compared to their well-known cousin VARCHAR.
If you know any … please let me know! :-)
Oh! And if you wonder how you can change your migrations to use this column type … this is not what Rails knows as :binary! That’s the MySQL BLOB type. Instead you can specify this manually by something like this:
class CreateForms < ActiveRecord::Migration
def self.up
create_table :cards do |t|
# ...
t.columns << 'value varbinary(255)'
end
end
end
However, this approach requires that you append the custom typed columns at the end of the list. Is there any better way to get custon column types accepted by migrations?



raveren said October 20th, 2008 at 09:58 AM ¶
Thanks a billion for this wonderful post, helped me immensely!
raveren said October 20th, 2008 at 11:00 AM ¶
bear in mind, that varbinary is case sensitive and there’s little to do about that…
jack said January 23rd, 2011 at 11:45 AM ¶
thanks for that heads up. That’s a useful tip! I’ve never ran into that, but for sure that’s something quite some people will need a solution for. cheap vps
QQQ said February 7th, 2011 at 06:37 PM ¶
Finally we kissed and the passion scale went sky high and I knew I was onto a good thing - sex was a certainty free porn videos. She never hesitated when I began to fondle her breasts and she willingly exposed them for me mobile porn. They were firm and I suspected a breast enhancement but said nothing - they still felt good and I was enjoying them and gradually working my way further south free porn tube. She was a step ahead of me and before I could completely undress her she moved on me atk hairy and I was suddenly having my pants pulled down and I was enjoying one of he best cock sucking hairy pussy experiences I had ever had. ABB728019394
Foana2 said February 24th, 2011 at 08:42 AM ¶
You got a really useful blog I have been here reading for about an hour. pass4sure 70-447 Useful information like this one must be kept and maintained so I will put this one on my bookmark list! pass4sure 70-448 Thanks for this wonderful post and hoping to post more of this blog here. pass4sure 70-450 I am a newbie and your success is very much an inspiration for me. pass4sure 70-455
chat said March 31st, 2011 at 07:38 PM ¶
This helped with extended characters:
The following cleaned up the issue:
Dependencies.loadoncepaths -= Dependencies.loadoncepaths.select{|path| \ path =~ %r(^#{File.dirname(FILE)}) }
Okey oyunu said May 12th, 2011 at 04:11 PM ¶
Thanks for this article. Tüm dünya artik okey oyunu oynuyor. Yillardir bir çok oyun programi olmasina ragmen, içlerinden en güzeli olarak nitelendirebilecegimiz tek bir site göze çarpmaktadir. Diger tüm okey oyunu programlarinin aksine ücretsiz olmasi ve 3 boyutlu olarak hizmet vermesi mükemmel bir gelismedir. Sizlerde www.okey-oyunu.com adresinden bu essiz okey oyununu indirebilirsiniz. Kullanimi çok basit ve Türkçe dil seçenegi ile kolaylikla oyuna baslayabilirsiniz. Ister kendi ülkenizden, isterseniz dünyanin tüm farkli bölgelerinden dilediginiz oyun odalarini seçerek, oyuna hemen baslayabilirsiniz. Okey oyunu oynamak için artik arkadas bile aramaniza gerek kalmadan, bilgisayarinizdan 100 binlerce üye ile online olarak okey oyununu oynamanin zevkine varabilirsiniz.
David said May 13th, 2011 at 10:11 PM ¶
Cool, danke meine MySql Datenbank bei meinem Handyprojekt funktioniert leider nicht mehr so wie ich mir das wünsche, habe deshalb nach Infos für mein handy shop portal gesucht und werde mal schauen, dass ich die Infos hier darauf anwenden kann. Vielen Dank für die Tipps, wäre schön auch mal wieder etwas neues von dir zu lesen.
porno said May 22nd, 2011 at 01:31 PM ¶
I do agree with all of the ideas you have presented in your post. They’re really convincing and will definitely work. Still, the posts are too short for newbies. Could you please extend them a bit from next time? Thanks for the post.
porno said May 22nd, 2011 at 02:06 PM ¶
good comment. thanks you friends.
I’ve surfed the net more than three hours today, however, I haven’t found such useful information. Thanks a lot, it is really useful to me