Rails 2.3 + Ruby 1.8 UTF-8 Hack

I work on a project which is currently still locked in Rails 2.3 running on Ruby 1.8 – of course, as years have gone by, more and more support for internationalization has come up, and now with emojis being part of the UTF-8 standard, and people people trying to use them in blog posts and comments and the like, I obviously encounter the fiasco that is trying to have Ruby on Rails on MySQL deal with this.

It’s been a mess.

In the end, I’ve just opted for a hack on the String class which gets used at the point that the model’s properties are assigned:

class String

  #
  # Converts multi-byte characters which use more than 2 bytes into HTML entities
  #
  def to_multibyte_html_entities
    each_char.map { |c| c.bytes.count > 2 ? "&#x#{c.multibyte_ord.to_s(16)};" : c }.join
  end
  
  #
  # Identical to #ord but properly supporting multibyte, like later versions
  # of Ruby
  #
  def multibyte_ord
    unpack('U')[0]
  end

end