Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
CSV with double quotes
Message
From
08/03/2017 04:36:05
Cetin Basoz
Engineerica Inc.
Izmir, Turkey
 
 
To
07/03/2017 13:23:26
General information
Forum:
Visual FoxPro
Category:
Coding, syntax & commands
Miscellaneous
Thread ID:
01648804
Message ID:
01648849
Views:
57
>More polished than my 1liner for sure ;-) Quick nosy question - did you compare your approach of writing out to file and concatenating header line with that file to _clip to inserting header line and then inserting one row down directly fro .DataToClip(,,3)? On current SSD probably irrelevant, but it perked my "find best way" character deficit ;-)


Yes, I did such a comparison. My temp folder that I use for 'Copy ... tab' is on a regular old 7200RPM drive and still beats VfpDataToClip() by a wide margin (more than 10 times faster, Copy to disk approach took under a second, while VfpDataToClip over 9 seconds on my repeated tests). Average timings were:
Copy to disk and read: 0.857 - 0.894 seconds
_Vfp.DataToClip      : 9.243 - 9.546 seconds
vfpDataToClip(,,3) was not returning the content right anyway, before doing a comparison it should be right in the first place (vfpDataToClip() adds an extra column).

Also, both VfpDataToClip() and copy ... tab approaches have the same added problem, you cannot use with any excel version straightforward. If there is a newer excel on the system that has MaxRows > 65000 (as all recent versions do) then you need to control 'passed' max rows at a time. Excel has a limit (who knows what) on the amount of data it accepts for pasting. On my tests, for the sample data 120,000 was good but 150,000 was not (with data that has more columns, that would be less, didn't calculate how many bytes was that).



>
>Quickscanned vfp2Excel, but have not done enough with .RS to form opinion. Most of the time our data is costly enough to be saved in import format as well in case of problems/questions months later best in a pure ASCII format, so no troubles from different excel versions come into play. Most of the time delimited with character ";" is good enough, although the same problems with special characters in text fields apply.
>
>"Best" way IMO is output via Copy to type sdf accompanied by an easy to parse field description from copy structure extended piped to file.
>Char fields need to be handled with a additional effort like with .RS for embedded CR and/or LF, but as there are no field delimiter chars no further problems besides defining a length for memo fields.
>
>Reading that fixed length file best done by some vba script reading the field definitions - Dunno if there is also a way to save things set in import wizard, but we are programmers ;-)


I also have codes for reading a text file from excel. One way or the other, what I have found is one shouldn't use text files for data transfer unless they are simple enough (MS even can't do data import from a text file into MS SQL Server right, how one could expect it to be better in office). Why would I do, when there are better ways is another question.


>
>The only thing missing then is a documented way to handle null if found in the data ;-)

If data is passed as data via ODBC/OLEDB it is handled.


Actually, when I said one of the fastest ways what I had in my mind was the "array" transfer not tab delimited. I thought that way but sent you the one for VfpDataToClip() alternative. Here is the array copy approach:
Clear
* Create a test cursor
Create Cursor testcursor (Rcno i,cType c(10), nType i, dType d)
Rand(-1)
For ix = 1 To 200000 && Create 200000 recs cursor
  Insert Into testcursor Values ;
    (Recco()+1,Sys(2015), Int(Rand()*1000), Date()-Int(Rand()*100))
Endfor
Locate

Start = Seconds()
lnRowsPerCopy = Floor(65000/Fcount())
Dimension aHeader[1,FCOUNT()]
For ix = 1 To Fcount()
  aHeader[1,m.ix] = Proper(Field(m.ix))
Endfor

Wait Window Nowait "Pls wait sending data to Excel..."
oExcel = Createobject("Excel.application")
With oExcel
  .workbooks.Add && Add a new workbook
  .Visible = .T.
  lnMaxRows = .ActiveWorkBook.ActiveSheet.Rows.Count && Get max row count
  lnNeededSheets = Ceiling( Reccount() / (m.lnMaxRows - 1) ) && 1 row header

  ? "Total Pages : "+Padr(lnNeededSheets,3)
  With .ActiveWorkBook
    If m.lnNeededSheets > .sheets.Count
      .sheets.Add(,.sheets(.sheets.Count),;
        m.lnNeededSheets - .sheets.Count) && Add new sheets after old ones
    Endif
    For ix =1 To m.lnNeededSheets
      .WorkSheets(m.ix).Name = "Page "+Padl(m.ix,3,"0")
    Endfor
    For ix=1 To m.lnNeededSheets
      With .WorkSheets(m.ix)
        .Activate
        .Range(.cells(1,1), .cells(1,Fcount())).Value = GetArrayRef('aHeader')
      Endwith

      For jx = 1 To Ceiling((m.lnMaxRows - 1) / m.lnRowsPerCopy)
        lnRowStart = Recno() - (m.ix - 1) * (m.lnMaxRows - 1) + 1
        lnRowEnd   = Min(m.lnRowStart + m.lnRowsPerCopy-1, m.lnMaxRows)

        Dimension aTrans[m.lnRowEnd - m.lnRowStart + 1,FCOUNT()]
        Copy Next (m.lnRowEnd - m.lnRowStart + 1) To Array aTrans
        lnCopied = _Tally
        Dimension aTrans[m.lnCopied,FCOUNT()]
        With .WorkSheets(m.ix)
          .Range(.cells(m.lnRowStart,1), .cells(m.lnRowStart + m.lnCopied-1,Fcount())).Value = GetArrayRef('aTrans')
        Endwith

        If !Eof()
          Skip
        Else
          Exit
        Endif
      Endfor
      ? "Page no : "+;
        padl(m.ix,3)+;
        "/"+;
        padr(m.lnNeededSheets,3)+ ;
        "   Elapsed : ",Seconds() - Start
    Endfor
  Endwith
  .sheets.Item(1).Activate
Endwith
? "Total Elapsed : ",Seconds() - Start

Function GetArrayRef(tcArrayName)
  Return @&tcArrayName
While this is one of the fastest available approaches, not the fastest.


The fastest one is QueryTables.Add method. It has 2 downsides:

1) It contains a link to real data (the data itself could be linked or embedded). If the xls file needs to be moved on to another computer (say emailed) then needs to be embedded version and takes time.
2) Version of excel might be 64 bits making this not an option.

Second fastest route is CopyFromRecordset which VFP2Excel use. Here is a variation of VFP2Excel that takes into account, Excel might be 64 bits and makes the transfer via an ADO.Stream. In this sample there is a 'Copy To' and Erase for the sample data, however, in real world we could simply point to the datasource and provide an SQL (and the datasource could be anything with an OLEDB driver):
Clear
* Create a test cursor
Create Cursor testcursor (Rcno i,cType c(10), nType i, dType d)
Rand(-1)
For ix = 1 To 200000 && Create 200000 recs cursor
  Insert Into testcursor Values ;
    (Recco()+1,Sys(2015), Int(Rand()*1000), Date()-Int(Rand()*100))
Endfor
Locate

Start = Seconds()
Local tableName
tableName = 'MyExcelData'+Sys(2015)
Copy To (ForcePath(m.tableName, Sys(2023)))

Local loStream As AdoDb.Stream
loStream = GetDataAsAdoStream("Provider=VFPOLEDB;Data Source="+Sys(2023), "select * from " + m.tableName)
Erase (ForcePath(ForceExt(m.tableName,'*', Sys(2023)))

Local oExcel
oExcel = Createobject("Excel.Application")
With m.oExcel
  .DisplayAlerts = .F.
  .Workbooks.Add
  .Visible = .T.

  * Send the data - copy to replacement
  VFP2ExcelVariation(m.loStream, .ActiveWorkBook.ActiveSheet.Range('A1'))
Endwith

? "Total Elapsed : ",Seconds() - Start


Function VFP2ExcelVariation(toStream, toRange, tcHeaders)
  Local loRS As AdoDb.Recordset,ix
  loRS = Createobject('Adodb.Recordset')
  m.loRS.Open( m.toStream )
  m.toRange.Offset(1,0).CopyFromRecordSet( m.loRS )  && Copy data starting from headerrow + 1

  * Use first row for headers
  Local Array aHeader[1]
  For ix=1 To Iif( !Empty(m.tcHeaders), ;
      ALINES(aHeader, m.tcHeaders,1,','), ;
      m.loRS.Fields.Count )
    m.toRange.Offset(0,m.ix-1).Value = ;
      Iif( !Empty(m.tcHeaders), ;
      aHeader[m.ix], ;
      Proper(m.loRS.Fields(m.ix-1).Name) )
    m.toRange.Offset(0,m.ix-1).Font.Bold = .T.
  Endfor
  m.loRS.Close()
Endfunc

Procedure GetDataAsAdoStream(tcConnection, tcSQL)
  Local loStream As 'AdoDb.Stream', ;
    loConn As 'AdoDb.Connection', ;
    loRS As 'AdoDb.Recordset'
  loStream = Createobject('AdoDb.Stream')
  loConn = Createobject("Adodb.connection")
  loConn.ConnectionString = m.tcConnection
  m.loConn.Open()
  loRS = loConn.Execute(m.tcSQL)
  m.loRS.Save( loStream )
  m.loRS.Close
  m.loConn.Close
  Return m.loStream
Endproc
As per the timings, I took some time to test them TAB, Array and VFP2Excel approach with the same data:

TAB delimited and array transfers were almost tied at 6.2 seconds for the sample data.
VFP2Excel approach with ADO.Stream took 5.8 seconds.

For this sample 200K rows data, I would ignore the timings (although VFP2Excel still is the fastest) but moreover, with OLEDB transfer approach you transfer data as data without the problems of text transfer. IOW you don't have memo, date, datetime, null ... transfer problems.

Therefore I classify OLEDB transfer approach as the best.
Çetin Basöz

The way to Go
Flutter - For mobile, web and desktop.
World's most advanced open source relational database.
.Net for foxheads - Blog (main)
FoxSharp - Blog (mirror)
Welcome to FoxyClasses

LinqPad - C#,VB,F#,SQL,eSQL ... scratchpad
Previous
Reply
Map
View

Click here to load this message in the networking platform