Character Sets - Level Extreme

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

Character Sets

Message

From

11/09/2014 22:07:19

Rick Strahl
West Wind Technologies
Maui, Hawaii, United States

11/09/2014 04:50:35

Jos Pols
C., South Africa

General information

Forum:

Visual FoxPro

Category:

Coding, syntax & commands

Title:

Re: Character Sets

Environment versions

Visual FoxPro:

VFP 9 SP2

OS:

Windows Server 2012

Network:

Windows 2008 Server

Database:

MS SQL Server

Application:

Web

Miscellaneous

Thread ID:

01607355

Message ID:

01607474

Views:

106

This message has been marked as a message which has helped to the initial question of the thread.

STRTOFILE() writes the raw byte signature of text - ie. the actual 255 ASCII character set for a string. Char encodings are applied only while the string lives inside of VFP. Since VFP uses ANSI strings (255 character set) that works just fine because that maps to the raw byte range of values that can be written out.

IOW, the text written to file has no encoding because ANSI in general doesn't have any encoding - it's just a 255 ASCII dump of the text. Now when your consumers read the text they WILL WANT TO APPLY the same encoding you used when you wrote out the file in order to get the same data with whatever tooling they use.

By default the character encoding will be Windows 1252 (or whatever variant thereof you might be using for your default CodePage - use CPCURRENT()) so your consumer should use the same encoding to read the file.

A better choice though typically is to export data in UTF-8 or Unicode with a byte-order mark. For example to write out text to UTF-8 you might use:

? STRTOFILE(STRCONV("This is Über formatted text",9),"c:\temp\test.txt",4)

UTF8 is the most common format for anything that uses Unicode because it saves a significant amount of space as only extended characters tend to be double-byte encoded, while Unicode requires two-bytes per character.

But you can do STRTOFILE with Unicode encoding and a Unicode DOM the same way as above. Either one of these should be easily read and automatically pick up the encoding based on the BOM.

If the BOM is not recognized (it should be) it's still better to use UTF-8/Unicode because that's what other environments are much more likely to expect rather than ANSI text.

+++ Rick ---

>Hi All
>
>I am receiving text (TXT) files from a 3rd party entity. I read the file in using FILETOSTR(), amend it, and then write it out to another TXT file using STRTOFILE(). Someone has now asked me what character set we are using for the output file. How do I find that out or answer this question?
>
>TIA

+++ Rick ---

West Wind Technologies
Maui, Hawaii

west-wind.com/
West Wind Message Board
Rick's Web Log
Markdown Monster
---
Making waves on the Web

Where do you want to surf today?

Map

View

Click here to load this message in the networking platform