Hi Dragan,
> this DBCS (which is, AFAIK, the full 16-bit Unicode set) is a necessary intermediate step. UTF-8 is simply a way to encode Unicode strings so that they pass as 8-bit strings, but thanks to the lead bytes are properly recognized and interpreted by the presentation layer. Two bytes header and occasionally two bytes per character surely beats six or seven bytes per character when inserting one of these in HTML or in a character field.
I'm getting a bit confused with this part... DBCS uses 1 bytes and occasionally 2 bytes to encode Chines, Japanese and Korean text. Entering, handling and displaying DBCS strings depends on having a compatible OS with appropriate regional settings set up. DBCS has lead bytes whereas UTF-8 uses bit patterns to identify the number of bytes per characters instead of a lead byte. UTF-8 encodes in 1-6 bytes depending on the character.
--
Christof