Saving unicode data - Level Extreme

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

Saving unicode data

Message

27/12/2011 13:47:47

Gregory Adam
Belgique

27/12/2011 08:53:13

Naomi Nosonovsky
Wisconsin, États-Unis

Information générale

Forum:

Visual FoxPro

Catégorie:

Contrôles ActiveX en VFP

Titre:

Re: Saving unicode data

Versions des environnements

Visual FoxPro:

VFP 9 SP2

OS:

Windows 7

Network:

Windows 2003 Server

Database:

MS SQL Server

Divers

Thread ID:

01531693

Message ID:

01531736

Vues:

102

This message has been marked as a message which has helped to the initial question of the thread.

J'aime (1)

Naomi Nosonovsky

>>You know about SYS(987,.T.) in VFP 9, right? It will map ANSI to Unicode and vice versa for SQL Passthrough and remote SQL connections. So as long as you can represent the captured Unicode input as ANSI text you'll be Ok (IOW, if the the character set you're running VFP in can represent the entered characters).
>>
>>Unicode usage generally only makes sense if you need to display/edit multiple different character sets simultaneously.
>>
>>+++ Rick ---
>
>I want to be able to type in any language. The form I am working on represents available languages for the Kiosk interface. I don't want to apply any restrictions.
>
>In any case, I found that MS Forms 2 Textbox can display any language (anything I tried so far, at least). Now I just need to figure out how to properly capture its value.
>
>Also, I tried using this setting, but I don't see any change in the behavior.

If you want to use unicode, you'll need to understand some basics

(1) A single byte charset has 256 bytes and 256 possible chars ( all western char sets I believe)

In a double byte char set, some bytes are called a lead byte. When a lead byte is encountered the next byte is fetched
So, some chars take 1 byte whilst others take 2 (eg chinese and japanese char sets)

The SBCS and DBCS (single and double byte char sets) were invented because the ascii table has no room to contain all the possible chars

The above are identified by a code page.

A char in a codepage (mostly above 0x80) is a different char in another codepage

(2) Then comes UTF - to get rid of all those code pages

There's UTF-32, UTF-16 and UTF-8

UTF-32 can represent all the chars of all the code pages. The downside is that a char occupies 4 bytes

Mostly, UTF-16 is sufficient, this is 2 bytes/char. .Net uses UTF-16 internally
In windows' parlance UTF-16 is called Wide Character

UTF-8 uses 1 byte, 2 bytes or 4 bytes per character. So, functions like substr(s, 49, 4) have to process all the bytes before the offset since it does not know how many bytes a character occupies. UTF-16 and UTF-32 do not have that disadvantage

Code pages and utf- see http://msdn.microsoft.com/en-us/goglobal/bb964654

(3) Foxpro can only work with codepages

(4) All ( I think) activeX controls work internally in UTF-16.

Now, when passing a string to an activeX ( like regex ) the activeX transforms the chars coming in to UTF16
Likewise, UTF-16 going out is converted to a codepage

How does the ACtiveX know that code page to convert from and to ?

(a) it uses the code page of sys(3101) which you can change. But I have not checked whether you can use that to specify a different code page for multiple ActiveX controls at ths same time

(b) There's also ComProp() - I have not used that

Some further reading and a couple of functions Re: Automating Excel extracting russian characters from cell Thread #1523701 Message #1523724

Strategy

(A)

Ok, now you have to find out how sqlserver keeps its unicode. Is it in UTF-8, UTF-16 or UTF-32
Then, you want to retrieve the data in vfp WITHOUT conversion to ANSI ( see sys(987) Rick mentioned)

Best way to find out is to put some cyrillic chars in unicode in sqlserver. Make sure to use chars that need two bytes - somewhere in that range above U+0410

If you have a field of say 3 unicode chars in sqlserver, get the field over in a vfp cursor via odbc. The examine the number of bytes you have in vfp

if you have 3 - you have a problem
if you have 6, most likely you have received utf-16
if you have 12, it is utf-32

more than 3, < = 12 would indicate utf-8

But best of all if to know in which format sql server keeps its unicode format. So you don't have to guess the number of bytes you have on the vfp side

(B) Since you only can feed the activeX with code page and a char ( sometimes two for DBCS) - to the best of my knowlegde, you have to convert the unicode to a code page before passing it to the activeX

For that you use some the functions of the message Re: Automating Excel extracting russian characters from cell Thread #1523701 Message #1523724

(C) Retrieving the value from the activeX will give you data in a codepage, which you then convert to the unicode format of sqlserver before ...

(D) transferring it back

Gregory

Répondre

Fil

Voir

Click here to load this message in the networking platform