Character Sets - Level Extreme

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

Character Sets

Message

11/09/2014 05:55:23

Gregory Adam
Belgique

11/09/2014 05:35:23

Jos Pols
C., Afrique du Sud

Information générale

Forum:

Visual FoxPro

Catégorie:

Codage, syntaxe et commandes

Titre:

Re: Character Sets

Versions des environnements

Visual FoxPro:

VFP 9 SP2

OS:

Windows Server 2012

Network:

Windows 2008 Server

Database:

MS SQL Server

Application:

Web

Divers

Thread ID:

01607355

Message ID:

01607375

Vues:

>>>>Hi All
>>>>
>>>>I am receiving text (TXT) files from a 3rd party entity. I read the file in using FILETOSTR(), amend it, and then write it out to another TXT file using STRTOFILE(). Someone has now asked me what character set we are using for the output file. How do I find that out or answer this question?
>>>>
>>>>TIA
>>>
>>>FILETOSTR() / STRTOFILE() work on plain bytes. They do not do any conversion. IOW you can read a binary as well.
>>>If you do not play around with stuff related to that problem the right answer would be: The output file has the same codepage as the input - if any.
>>
>>IOW, what Borislav said is still true, not because of anything that these two functions do, but because of how the app composed the string. If there were any characters in it which were codepage specific, they were done according to those settings, and written into the file as bytes.
>
>So are we all saying that if I:
>
>1) Read in the file - FILETOSTR()
>2) Calculate a checksum on it - SYS(2007)
>3) Add the checksum to the file - concatenate the strings, and
>4) Write it back out to a new file - STRTOFILE()
>
>... then the codepage / character set of the new file is identical to the codepage / character set of the original file?

Jos.

TXT files do not have any code page attached unless there's a BOM at the beginning of the file

Create a txt file with notepad, enter abcd, then do Save As
(1) Ansi ( = no BOM)
(2) UTF8 (UTF8 BOM)
(3) Unicode ( UTF16 little endian BOM)
(4) Unicode big endian ( UTF16 big endian BOM)

Then look at each of the files with a hex editor

>... then the codepage / character set of the new file is identical to the codepage / character set of the original file?
It depends. If the first bytes of the file are a BOM and you did not change it, then yes

ps- if you receive a txt file without BOM ( = ANSI) with Russian chars - you will not be able to see the russian chars if your computer is in code page 1252

Gregory

Répondre

Fil

Voir

Click here to load this message in the networking platform