jeudi 10 décembre 2015

Clean Tabs / CRLF / Word Bullets but keep French Characters in Rails

I'm having a lot of trouble in cleaning my strings. We need to remove all "strange" characters like bullets / tabs / ... but keep Characters like &é"''(§§è!!çà ....

After reading a lot of posts online we have created the following code. But it removes all the nasty tabs but still keeps the bullets.

  def strip_tabs(value)
    return "" unless value
    #Clean value
    if value.kind_of?(String)
      value = value.squish!
      encoding_options = {
        :invalid           => :replace,  # Replace invalid byte sequences
        :undef             => :replace,  # Replace anything not defined in ISO-8859-1
        :replace           => '',        # Use a blank for those replacements
        :UNIVERSAL_NEWLINE_DECORATOR => true       # Always break lines with \n
      }
      value = value.encode(Encoding.find('ISO-8859-1'), encoding_options)
      value = value.encode('UTF-8')
    end
    return value
  end

Before Strip:

"•\tZefzefz\r\n•\tZefzefze\r\nZef\t zefz\t \r\n\r\n"

After Strip:

"• Zefzefz • Zefzefze Zef zefz"

I know we can use a gsub or delete but we need a more global solutions because you have a lot strange characters like this.

We are running ruby 1.9.3p551 and Rails 3.2.19.

Kind regards

Aucun commentaire:

Enregistrer un commentaire