Getting accented chars to Adobe form

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

Getting accented chars to Adobe form

Message

18/08/2009 15:34:06

Christof Wollenhaupt
Foxpert
Norderstedt, Allemagne

18/08/2009 10:26:22

Albert Gostick
Kincardine, Ontario, Canada

Information générale

Forum:

Visual FoxPro

Catégorie:

Codage, syntaxe et commandes

Titre:

Re: Getting accented chars to Adobe form

Versions des environnements

Visual FoxPro:

VFP 9 SP1

OS:

Windows XP SP2

Network:

Windows 2003 Server

Database:

Visual FoxPro

Application:

Desktop

Divers

Thread ID:

01418540

Message ID:

01418748

Vues:

Hi Albert,

> I was using CHRTRANC() to do the substitution so it should have handled substituting in the old character for the new).

Which would explain this. CHRTRANC() is meant to operate on DBCS, not on UTF-8.

DBCS uses special lead bytes (CHR(128) and higher) to indicate that the following character is a different character than the same code would have without the lead byte. In DBCS some letters consist of just one character in the string, some are two characters. VFP by default treats these combined characters as two separate characters. The only exception are the *C functions which scan the string from left to read and evaluate all lead bytes.

UTF-8 is in so far similar as it uses variable length encoding. Some letters are just one character in VFP, others are two, three, or more. There are no functions in VFP that operate on the UTF-8 character set. The most we get are the conversion functions and the ability to pass UTF-8 to COM objects. VFP always treats any part of a multi-character letter as a single character.

The result of the conversion to UTF-8 most likely does not produce a lead byte. In fact, if it did, the character would be misinterpreted. Therefore CHRTRANC() operates exactly like CHRTRAN() and only replaces the one character:

wrong: Chrtranc("1ö1","ö",Strconv("ö",9))

0x00000000 : 31 C3 31                                           1Ã1

correct: StrTran("1ö1","ö",Strconv("ö",9))

0x00000000 : 31 C3 B6 31                                        1Ã¶1

STRTRAN() would be sufficient if you replaced a single character. because the encoded UTF-8 also contains characters above 127, STRTRAN() would convert some bytes multiple times. Hhence, the only way to convert a string properly to UTF-8 (aside from passing the entire string to STRCONV()) is to use LEFTC() to get the first character, convert it to UTF-8, append the result to a new string, and use SUBSTRC() on the remaining string to cut off the first character.

>Just to confirm, I do need to run STRCONV() twice - once to convert to DBCS and the 2nd time to UTF-8?

If you want your application to work all over the world, you need to do both. If the application only runs in North and South America and Western Europe (codepage 1252), you can perform just the conversion to UTF-8. The difficulty of DBCS is that it depends on the code page of your system (locale and region settings). Only on an Asian system you actually see the result of DBCS.

--
Christof

Répondre

Fil

Voir

Click here to load this message in the networking platform