Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
Getting accented chars to Adobe form
Message
De
19/08/2009 11:41:36
 
 
À
19/08/2009 10:24:00
Dragan Nedeljkovich (En ligne)
Now officially retired
Zrenjanin, Serbia
Information générale
Forum:
Visual FoxPro
Catégorie:
Codage, syntaxe et commandes
Versions des environnements
Visual FoxPro:
VFP 9 SP1
OS:
Windows XP SP2
Network:
Windows 2003 Server
Database:
Visual FoxPro
Application:
Desktop
Divers
Thread ID:
01418540
Message ID:
01418988
Vues:
38
Hi guys,

I ran the tests on my data and it would appear that the conversion from 8-bit to DBCS does NOT change the string of data i.e. the string before and after the conversion was the same length and using the "==" sign to compare the two says they are equal. When I sent it through STRCONV(lcXMLString,9) to convert to UTF-8, the length grew from 5334 to 5335 - so that conversion did indeed do something. So I am leaning towards the theory that the STRCONV(,1) conversion to DBCS is not necessary in this case when we are just dealing with North American and Western European code pages...but I might leave the code in to do both conversions anyhow :-)

Thanks for all the help to you both.

Albert Gostick

>>Hi Dragan,
>>
>>> this DBCS (which is, AFAIK, the full 16-bit Unicode set) is a necessary intermediate step. UTF-8 is simply a way to encode Unicode strings so that they pass as 8-bit strings, but thanks to the lead bytes are properly recognized and interpreted by the presentation layer. Two bytes header and occasionally two bytes per character surely beats six or seven bytes per character when inserting one of these in HTML or in a character field.
>>
>>I'm getting a bit confused with this part... DBCS uses 1 bytes and occasionally 2 bytes to encode Chines, Japanese and Korean text. Entering, handling and displaying DBCS strings depends on having a compatible OS with appropriate regional settings set up. DBCS has lead bytes whereas UTF-8 uses bit patterns to identify the number of bytes per characters instead of a lead byte. UTF-8 encodes in 1-6 bytes depending on the character.
>
>I'm sure of one thing: I haven't got all the details, and those I got I am not quite sure of :). That's what the AFAIK was intended for. Unfortunately, what text I found on the subject came from Microsoft, so nothing was called in clear terms that relate to the real world, but rather in their solipsistic (or should I say autistic) terminology. Some of it possibly had negative information (i.e. after you read it you actually know less than before).
Précédent
Répondre
Fil
Voir

Click here to load this message in the networking platform