I'm having a lot of trouble in cleaning my strings. We need to remove all "strange" characters like bullets / tabs / ... but keep Characters like &é"''(§§è!!çà ....
After reading a lot of posts online we have created the following code. But it removes all the nasty tabs but still keeps the bullets.
def strip_tabs(value)
return "" unless value
#Clean value
if value.kind_of?(String)
value = value.squish!
encoding_options = {
:invalid => :replace, # Replace invalid byte sequences
:undef => :replace, # Replace anything not defined in ISO-8859-1
:replace => '', # Use a blank for those replacements
:UNIVERSAL_NEWLINE_DECORATOR => true # Always break lines with \n
}
value = value.encode(Encoding.find('ISO-8859-1'), encoding_options)
value = value.encode('UTF-8')
end
return value
end
Before Strip:
"•\tZefzefz\r\n•\tZefzefze\r\nZef\t zefz\t \r\n\r\n"
After Strip:
"• Zefzefz • Zefzefze Zef zefz"
I know we can use a gsub or delete but we need a more global solutions because you have a lot strange characters like this.
We are running ruby 1.9.3p551 and Rails 3.2.19.
Kind regards
Aucun commentaire:
Enregistrer un commentaire