skip to main | skip to sidebar

Old fashioned guys use SQL Server varchar/text fields to store strings of multiple languages that uses 1 byte encoding like [Windows Character Table] :
Windows-1252 : English, French , Spanish, German,Italian,Spanish (Western European characters)...
Windows-1251 : Russian,Bulgarian,Serbian,Ukrainian
Windows-1253 : Greek
Windows-1256 : Arabic
.....

Of course, 1 byte encoding field can contain English + only one other language characters - unlike UTF-8) , just as a file encoded in 1-byte encoding..

To know about Character sets, you should check :
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

in that case : Asp pages codepage should remain as default = 1252
<% @ LANGUAGE=VBScript CODEPAGE=1252 %>

and setting the Response.Charset or HTML meta tag for charset correctly will show things right.. and HTML page size is smaller than the same one in UTF-8

of course running a site like that in IIS, will require that windows > Control Panel > Regional and Language options > Advanced > must be English or Strings read from SQL server will be corrupted...

A disadvantage is that you can't show more than one language (Other than English) in the same page without using escape codes... ,which suitable is for small text (a link to other language home page)

but, if you need to output UTF-8 file (text,Xml ,RSS,..) from non UTF-8 page, you must remember that Strings are Unicode in memory, so if you read a string from SQL Server using settings as mentioned before , and as an example :
- if we have that string "привет" which is "hi" in Russian
- and saved in varchar field in SQL Server it will look like "ïðèâåò"
- when you read that string in memory using ado it will look like "ïðèâåò" , cause VB string can't know it is Russian ( it is readed from varchar and default codepage is 1252 ,so it thinks it is Western European characters)
- So To Convert it to Russian will use ADO Stream :
AlterCharset("ïðèâåò","windows-1252", "windows-1251")

- After that it would be saved in memory as "привет"
- and when written to UTf-8 file , it will be "привет" , but if u don't do the Conversion step it will be "ïðèâåò"

enough talking , here is the code
For this code to work in VB6, you will need to add a reference to the Microsoft ActiveX Data Objects 2.5+ Library and change [Dim Stream : Set Stream=Server.CreateObject("ADODB.Stream") ] to [Dim Stream as new ADODB.Stream]

Const adTypeBinary = 1
Const adTypeText = 2

' accept a string and convert it to Bytes array in the selected Charset
Function StringToBytes(Str,Charset)
  Dim Stream : Set Stream = Server.CreateObject("ADODB.Stream")
  Stream.Type = adTypeText
  Stream.Charset = Charset
  Stream.Open
  Stream.WriteText Str
  Stream.Flush
  Stream.Position = 0
  ' rewind stream and read Bytes
  Stream.Type = adTypeBinary
  StringToBytes= Stream.Read
  Stream.Close
  Set Stream = Nothing
End Function

' accept Bytes array and convert it to a string using the selected charset
Function BytesToString(Bytes, Charset)
  Dim Stream : Set Stream = Server.CreateObject("ADODB.Stream")
  Stream.Charset = Charset
  Stream.Type = adTypeBinary
  Stream.Open
  Stream.Write Bytes
  Stream.Flush
  Stream.Position = 0
  ' rewind stream and read text
  Stream.Type = adTypeText
  BytesToString= Stream.ReadText
  Stream.Close
  Set Stream = Nothing
End Function

' This will alter charset of a string from 1-byte charset(as windows-1252)
' to another 1-byte charset(as windows-1251)
Function AlterCharset(Str, FromCharset, ToCharset)
  Dim Bytes
  Bytes = StringToBytes(Str, FromCharset)
  AlterCharset = BytesToString(Bytes, ToCharset)
End Function

6 comments

  1. Anonymous // May 13, 2009 at 6:02:00 AM GMT+10  

    Thanks a lot for this post! Finally, someone coherently explained what happens if you try to ouput 1251-encoded chars on a .NET page from MSSQL.

    A bit simplified C# code that works for me:

    using System.Text;

    ...

    String raw_db_string = "";
    Encoding sourceEncoding = Encoding.GetEncoding(1252);
    Encoding targetEncoding = Encoding.GetEncoding(1251);

    byte[] encodeTheseChars = sourceEncoding.GetBytes(raw_db_string);

    String recoded_string = targetEncoding.GetString(encodeTheseChars);

    ...

  2. Anonymous // September 5, 2009 at 12:46:00 PM GMT+10  

    Using this ADODB.Atream method to convert from windows-1252 to utf-8 causes accented characters and other special characters to dissapear. They don't turn into the wrong character, they simply dissapear. A seven character word with two accented characters turns into a five character word.

    Is there an alternate method in ASP or ASPX to convert Windows 2003 server SQL server-stored text to something an iPhone can display? iPhone applications refuse to work with an XML file that declares windows-1252 encoding.

  3. Mike // September 6, 2009 at 6:47:00 AM GMT+10  

    Weird, this method has always worked for me on ASP + Windows servers..!

    Are you saying that the other 5-chars are in plain English chars, this might mean that original text were not correctly loaded in memory as I described in the post..

    In ASP, ADODB.Stream is the only built-in method.. you can try other 3rd-party components.
    In ASP.Net, no need for old ADO. you have built-in methods in System.Text to deal with encoding.
    1 byte encoding is very old thing and should not be used anymore... and certainly windows-like encoding are not suitable outside windows.

    Can't you have SQL Server to use Unicode fields like nText, and use UTF-8 ASP/ASP.Net pages.

    Also, try creating XML files in ISO-8859-1! I believe ADODB.Stream can work with ISO encodings.

    If you still have the problem. contact me and attach some samples..

  4. About Marjan Panic // February 9, 2010 at 3:45:00 AM GMT+11  

    Hi guys,

    This is interesting, but I face another issue.
    In ram memory, c# string is always represented as Unicode with code page 1252.
    Using system.encoding.convert method, I've tried to convert it to code page 1251, but I still get the same result: chars (or bytes) always have the same 1252 value. In other words, they don't get translated.
    Does any1 have some good idea how to represent string in memory in other code page, or to put it more simple, how to convert text to/from latin to cirilic alphabet?

    Thanks

    p.s. and the sample that doesn't work


    Encoding engEnc = System.Text.Encoding.GetEncoding(1252);
    Encoding rusEnc = System.Text.Encoding.GetEncoding(20866);

    byte[] rusBytes = rusEnc.GetBytes(wordToEncode);
    byte[] resBytes = Encoding.Convert(engEnc, rusEnc, rusBytes);

    string result = rusEnc.GetString(rusBytes);

  5. Anonymous // September 19, 2010 at 1:03:00 PM GMT+10  

    Dear author of the post.

    Please accept my sincere thanks for explaining the issue and sharing the code.

    I have over 20 000 articles ina database all inserted by a classic ASP CMS with bloody windows-1256 (had no other choice).

    Now I have an ASP.NET website and wanted to read the articles. It was a mess with all encodings and globalizations.

    Thank you for solving my problem and saving me so much time.

    By the way, in my case in code behind with .net framework 4.0 I had to remove the "server" when I created the ADODB object so it became

    CreateObject("ADODB.Stream")

    instead of

    Server.CreateObject("ADODB.Stream")

    I put it in app_code folder and use it directly on my EVAL values.


    Thank you very much again!
    Hiwa Afandi

  6. Francisco Campos // October 18, 2010 at 5:10:00 AM GMT+11  

    I LOVE THIS CODE!!!!!!!! Thank you SO MUCH!!!!!!!

Post a Comment

Thank you for taking the time to comment..
* If you have a tech issue with one of my plugins, you may email me on mike[at]moretechtips.net

Mike

Mike MoreWeb developer, jQuery plugin author, social media fan and Technology Blogger.
My favorite topics are: jQuery , Javascript , ASP.Net , Twitter , Google..
<connect with="me"> </connect>

Subscribe by email

Enter your email address:

or via RSS