Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
Getting accented chars to Adobe form
Message
Information générale
Forum:
Visual FoxPro
Catégorie:
Codage, syntaxe et commandes
Versions des environnements
Visual FoxPro:
VFP 9 SP1
OS:
Windows XP SP2
Network:
Windows 2003 Server
Database:
Visual FoxPro
Application:
Desktop
Divers
Thread ID:
01418540
Message ID:
01418748
Vues:
65
Hi Albert,

> I was using CHRTRANC() to do the substitution so it should have handled substituting in the old character for the new).

Which would explain this. CHRTRANC() is meant to operate on DBCS, not on UTF-8.

DBCS uses special lead bytes (CHR(128) and higher) to indicate that the following character is a different character than the same code would have without the lead byte. In DBCS some letters consist of just one character in the string, some are two characters. VFP by default treats these combined characters as two separate characters. The only exception are the *C functions which scan the string from left to read and evaluate all lead bytes.

UTF-8 is in so far similar as it uses variable length encoding. Some letters are just one character in VFP, others are two, three, or more. There are no functions in VFP that operate on the UTF-8 character set. The most we get are the conversion functions and the ability to pass UTF-8 to COM objects. VFP always treats any part of a multi-character letter as a single character.

The result of the conversion to UTF-8 most likely does not produce a lead byte. In fact, if it did, the character would be misinterpreted. Therefore CHRTRANC() operates exactly like CHRTRAN() and only replaces the one character:
wrong: Chrtranc("1ö1","ö",Strconv("ö",9))

0x00000000 : 31 C3 31                                           1Ã1

correct: StrTran("1ö1","ö",Strconv("ö",9))

0x00000000 : 31 C3 B6 31                                        1ö1
STRTRAN() would be sufficient if you replaced a single character. because the encoded UTF-8 also contains characters above 127, STRTRAN() would convert some bytes multiple times. Hhence, the only way to convert a string properly to UTF-8 (aside from passing the entire string to STRCONV()) is to use LEFTC() to get the first character, convert it to UTF-8, append the result to a new string, and use SUBSTRC() on the remaining string to cut off the first character.

>Just to confirm, I do need to run STRCONV() twice - once to convert to DBCS and the 2nd time to UTF-8?

If you want your application to work all over the world, you need to do both. If the application only runs in North and South America and Western Europe (codepage 1252), you can perform just the conversion to UTF-8. The difficulty of DBCS is that it depends on the code page of your system (locale and region settings). Only on an Asian system you actually see the result of DBCS.
--
Christof
Précédent
Suivant
Répondre
Fil
Voir

Click here to load this message in the networking platform