Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Automating Excel extracting russian characters from cell
Message
From
16/09/2011 09:01:57
 
 
To
16/09/2011 03:09:58
General information
Forum:
Visual FoxPro
Category:
Other
Miscellaneous
Thread ID:
01523701
Message ID:
01523724
Views:
198
This message has been marked as the solution to the initial question of the thread.
Likes (1)
>I converted pdf table to excel format ( Office 2010 ) with ABBYY pdf transformer and try to generate xml file via Excel automation.
>
>All Russian and Belorussian names properly seen in Excel
>РУБИНОВ Анатолий Николаевич
>Рубінаў Анатоль Мікалаевіч
>
>But after this statement
>
>lcell_value = osheet.Cells(i,j ).value
>
>in Foxpro they are just sequences of "?". I tried to play with strconv , but didn't find a proper combination. My understanding in Excel it is UTF-16, and in xml we should end up with utf-8.




Michael,

You are right to think that Excel keeps its data in UTF16. COM objects usually do.

You are encountering two problems

(1) Foxpro works in SBCS, ie Single Byte Character Code Set.
Your code page is 1252 I assume. There's no way foxpro can display chars from another code page other than the usual overlap (a-z, 0-9, etc)


(2) Each time foxpro transfers data to /from Excel it translates - or tries to translate - UTF16 to/from the code page of the OS, which in your case will be 1252
(see Control panel | Regional and language, TAB Administrative, Language for non-unicode programs)

What happens is that the UTF16 data from excel gets translated into code page 1252. The Cyrillic chars do not fit in code page 1252, hence the ???
http://msdn.microsoft.com/en-us/goglobal/cc305144



The only way to proceed is
(a) Tell excel that you are using code page 1251 (Cyrillic). That way the UTF16 will be converted into code page 1251

(b) Convert the data from code page 1251 to UTF16, then UTF16 to UTF8

Once you have the data in utf8, store it in a table field with NOCPTRANS (char binary)
If you want that data output to xml utf8, make sure it does not try to translate it again. There must be ways - SET NOCPTRANS TO field1, field2, - combined with some properties of the xmladapter


I'm covering (a) and (b)

(a) To tell excel you want codepage 1251
oldCodePage =sys(3101, 1251)
Set the oldCodePage back after using excel

(b) I include a couple of functions, code further below
StringToUTF8(@utf8Out, m.stringIn, m.codepageIn)
StringToUTF16(@m.utf8Out, m.stringIn, m.codepageIn)
StringUTF16ToUTF8(@m.utf8Out, m.utf8Out)
I have cut and pasted Рубінаў Анатоль Мікалаевіч into excel

Here's the test - you need to clean up excel afterwards, and set the old code page back
function Do_it()

	
	local xlObj, sheetObj
	
	xlObj = createObject('Excel.Application')
	
	=xlObj.WorkBooks.Open('d:\tmp\book1.xlsx')
	sheetObj = m.xlObj.Sheets[1]
	
	
	=sys(3101, 1251)
	v = m.sheetObj.Cells(1,1).Value
	utf8 = ''
	
	
	
	assert StringToUTF8(@m.utf8, m.v, 1251)
	
	?utf8
	assert false
	
	
		
endfunc
*_______________________________________________________________________________
Functions

- for StringUTF16ToUTF8, strconv(, 10) would work as well I guess
&& StringConversion
&& Gregory Adam 2011
*===============================================================================
#include	"FoxPro.h"
*===============================================================================
#define true	.T.
#define false	.F.
*===============================================================================



*===============================================================================
function StringToUTF8(utf8Out, stringIn, codepageIn)
	
	local success
	success = true
	
	do case
	case !m.success
	
	case !StringToUTF16(@m.utf8Out, m.stringIn, m.codepageIn)
		assert false
		success = false
	
	case !StringUTF16ToUTF8(@m.utf8Out, m.utf8Out)
		assert false
		success = false
	
	endcase
	
	return m.success
	
endfunc
*_______________________________________________________________________________


*_______________________________________________________________________________
*_______________________________________________________________________________
#define CP_ACP					0
#define CP_MACCP				2
#define CP_OEMCP				1
#define CP_SYMBOL				42
#define CP_THREAD_ACP			3
#define CP_UTF7					65000
#define CP_UTF8					65001
#define MB_PRECOMPOSED			0x1
#define MB_COMPOSITE			0x2
#define MB_USEGLYPHCHARS		0x4
#define MB_ERR_INVALID_CHARS	0x8

#define WC_DEFAULTCHAR			0x00000040 
#define WC_ERR_INVALID_CHARS	0x00000080 
#define WC_NO_BEST_FIT_CHARS 	0x00000400 
*_______________________________________________________________________________
function StringToUTF16(utf16Out, stringIn, codepageIn)

	local success
	success = true
	
	do case
	case !m.success
	
	case empty(len(m.stringIn))
		utf16Out = ''
		
	otherwise
		local lpWideCharStr, result
		lpWideCharStr = space(len(m.stringIn)*2)
	
		result = MultiByteToWideChar( ;
					evl(m.codepageIn, cpcurrent()), ;
					MB_ERR_INVALID_CHARS, ;
					@m.stringIn, ;
					len(m.stringIn), ;
					@m.lpWideCharStr, ;
					len(m.lpWideCharStr) ;
				)
			
		do case
		case !m.success
		
		case empty(m.result)
			assert false
			success = false
		
		otherwise
			utf16Out = left(m.lpWideCharStr, m.result * 2) 
		
		endcase
		
	endcase
	
	return m.success
	
	
endfunc
*_______________________________________________________________________________
function StringUTF16ToUTF8(utf8Out, utf16In)

	local success
	success = true

	
	
	do case
	case !m.success
	
	case empty(len(m.utf16In))
		utf8Out = ''
		
	otherwise
	
		local lpMultiByteStr, lpUsedDefaultChar, result
		lpMultiByteStr = space(len(m.utf16In) * 2)
		lpUsedDefaultChar = 0
		
		result = WideCharToMultiByte( ;
					CP_UTF8, ;
					WC_ERR_INVALID_CHARS, ;
					@m.utf16In, ;
					len(m.utf16In)/2, ;
					@m.lpMultiByteStr, ;
					len(m.lpMultiByteStr), ;
					null, ;
					@m.lpUsedDefaultChar ;
				)
		
		do case
		case !m.success
		
		case empty(m.result)			
			assert false
			success = false
			
			
		otherwise
			utf8Out = left(m.lpMultiByteStr, m.result)
		
		endcase
	endcase
	
	return m.success
	
	
	
endfunc
*_______________________________________________________________________________
function MultiByteToWideChar
	lparameters codepage, ;
				dwFlags, ;
				lpMultiByteStr, ;
				cbMultiByte, ;
				lpWideCharStr, ;
				cchWideChar

	local success
	success = true

	local result
	
	do case
	case !m.success
	
	otherwise
		try
			declare integer MultiByteToWideChar in Kernel32.dll ;
				long	codepage, ;
				long	dwFlags, ;
				string@	lpMultiByteStr, ;
				integer	cbMultiByte, ;
				string@	lpWideCharStr, ;
				integer	cchWideChar
		
			result = MultiByteToWideChar( ;
					m.codepage, ;
					m.dwFlags, ;
					@m.lpMultiByteStr, ;
					m.cbMultiByte, ;
					@m.lpWideCharStr, ;
					m.cchWideChar ;
				)
		catch
			assert false
			success = false
			
		endtry
	endcase
	
	return iif(m.success, m.result, 0)
	
endfunc
*_______________________________________________________________________________
function WideCharToMultiByte
	lparameters codepage, ;
				dwFlags, ;
				lpWideCharStr, ;
				cchWideChar, ;
				lpMultiByteStr, ;
				cbMultiByte, ;
				lpDefaultChar, ;
				lpUsedDefaultChar

	local success
	success = true

	local result
	
	do case
	case !m.success
	
	otherwise
		try
			declare integer WideCharToMultiByte in Kernel32.dll ;
				long	codepage, ;
				long	dwFlags, ;
				string@	lpWideCharStr, ;
				integer	cchWideChar, ;
				string@	lpMultiByteStr, ;
				integer	cbMultiByte, ;
				string	lpDefaultChar, ;
				integer	@lpUsedDefaultChar
		
			result = WideCharToMultiByte ( ;
					m.codepage, ;
					m.dwFlags, ;
					@m.lpWideCharStr, ;
					m.cchWideChar, ;
					@m.lpMultiByteStr, ;
					m.cbMultiByte, ;
					m.lpDefaultChar, ;
					@m.lpUsedDefaultChar;
				)
		catch
			assert false
			success = false
			
		endtry
	endcase
	
	return iif(m.success, m.result, 0)
	
endfunc
*_______________________________________________________________________________
Gregory
Previous
Next
Reply
Map
View

Click here to load this message in the networking platform