windows1251_encoding

Support of different encodings when uploading documents to Retiffy

We live in a really big and diversified world, no question about it.

That is way we are proud to announce you can now upload and generate certificates from CSV documents written in different languages and more importantly in different Character Sets.

I prefer not to go into the details on how text is handled in computers and on the Internet. If you want to learn more about character sets and encoding you could follow this link – All About Unicode, UTF8 & Character Sets

Always save in Unicode (UTF-8)

If you want your documents to be processed correctly by most of the software programs on this world you should save them in a UTF-8 or Unicode. You can than import your CSV documents to Retiffy. Following is a screenshot of a dialog from Libreoffice when saving the document as text. You are given the chance to select the Character Set: In this case we select Unicode (UTF-8)

save_with_encoding

Saving text documents in Libre Office and selecting the character set

 

If you are using Excel you can choose the Encoding from Save->Tools->Web Options->Encoding Tab as show on the pictures below

ExcelSelectEncoding1

ExcelSelectEncoding2

Choosing Encoding in Microsoft Excel.

 

 

 

 

 

 

 

 

 

There should also be an option in any other software that you are using. Specifying the encoding to UTF-8 gives you the greatest chance to open this document correctly with a software different from the one that created it.

I don`t know what “character set is” and which “character set” am I using

If by any chance you prefer, or you are unaware that you use a encoding different from UTF-8, then Retiffy will report this. Take for example a CSV document containing Cyrillic letters in Windows-1251 character set. This is how they would look like when uploaded to Retiffy.

windows1251_encoding

 

As you can see on this screenshot Retiffy tries to read the file as UTF-8, but since it is created on a Windows machine and encoded in Windows-1251 you see this strange characters. If you select the correct encoding Retiffy will automatically show you how the characters are interpreted and if it could understand them, as on the screenshot below:

windows1251_encoding_correct

Why can`t Retiffy guess the character set?

Text files do not contain information about the character set in which characters are encoded. We were trying for the last few months to guess the encoding. And although it was 96% successful there were cases in which the character sets could not be guessed. That’s way we left it to the user to select it from a drop down list. I am sure that now we would have 100% success rate.

What character sets are supported?

We are currently supporting

“UTF-8″, ”ISO 8859-1″, “ISO-8859-2″, “ISO-8859-7″, “ISO-8859-9″, “ISO-8859-15″, “Windows-1251″, “Windows-1252″, “Windows-1253″, “Windows-1254″, “Windows-1256″, “Windows-874″, “Windows-1254″, “Windows-1250″, “KOI8-R”, “Shift JIS”, “EUC-JP”, “EUC-KR”, “GBK”, “GB2312″, “Big5″, “US-ASCII”, “TIS-620″.

If your documents are using a different character set, feel free to write to us. We will implement it as quickly as possible.

 

Leave a Comment


× 8 = eight

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>