Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Converting zip to dbf
Message
From
31/12/2018 21:44:45
 
 
To
31/12/2018 16:30:45
General information
Forum:
Visual FoxPro
Category:
Databases,Tables, Views, Indexing and SQL syntax
Miscellaneous
Thread ID:
01664985
Message ID:
01665034
Views:
53
>>>Data is clean should have no errors
>>Ahem... thoroughly false
>>Spot check in shares.csv shows:
>>
>>
>>
>>
>>2016-11-17 11:03:27,https://www.linkedin.com/feed/update/urn%3Ali%3Ashare%3A6204971998725042178,"""https://lnkd.in/dWwrSJy""",,,MEMBER_NETWORK
>>2016-10-28 10:41:00,https://www.linkedin.com/feed/update/urn%3Ali%3Ashare%3A6197718591576567808,"Whatever your business, if you need Consultancy, Marketing or IT help call Colin Northway Associates on 0208 954 5595 or email colin@colin-northway.com""
>>""""
>>""Strong track record in Automotive, Courier and Gas and Electicity Industries. Strong track record in Customer Retention.""
>>""""
>>""Can you afford not to contact us? Then see our web site at www.colin-northway.com",http://image-store.slidesharecdn.com/39cb4cb0-43b2-43f4-b9a4-1324ac65f2ca-large.png,http://image-store.slidesharecdn.com/39cb4cb0-43b2-43f4-b9a4-1324ac65f2ca-large.png,MEMBER_NETWORK
>>
>>many violations of basic csv rules concerning field/record delimination
>>
>>
>>2016-03-14 13:48:30,https://www.linkedin.com/feed/update/urn%3Ali%3Ashare%3A6115141345649651712,I have a backgound in Advertising Marketing and IT and I am currently running a IT and Marketing consultancy operation specialising in the Courier Industry. If this rings any bells with you contact me at colin@colin-northway.com ,,,MEMBER_NETWORK
>>
>>missing ["] in front of [I have a backgound]
>>
>>
>>connections.csv
>>
>>Akeel,Hussain,,ICON DESIGNS LIMITED,"International and UK Sales, Operations & Logistics",24 Dec 2017
>>...
>>Grigore Octavian,Pupăzan,,Run4Job,Director General,20 Dec 2017 
>>...
>>Joanna,Szmaglińska,,VISLINE,Managing Director,11 Dec 2017 
>>
>>(1) mixing/different rules of delimiting character fields
>>(2) character set interspersed with non-8-bit ASCII without ay hints on what character set was intended
>>
>>stopped looking then.
>>Probably some (misguided) manual screen scraping of web/html, incurring at least those errors,
>>with high probability others needing different solution strategies..
>>
>>Finding so many (not hard to solve, but irritating) problems in first scans, no attempt at quoting a price possible,
>>as more lurks/any realistic quote needs more than half the work to be done for real.
>
>[...]
>
>>
>>regards
>>
>>thomas
>
>Thomas, I know it's not your data, but I took on your initial analysis and submitted those 2 files (connections and shares) to a CSVProcessor object.
>
>This came as an opportunity to improve on the class, that now seems to be able to ingest Colin's data. Thank you for your insight.

Interesting - how did you attack those newlined qudrupled ["] or newlined bare text pieces belonging to the row starting a few lines above ?
I had calculated either well beyond 50% of the csv file # with simple import or a quick-stream fixing vector if angling for large files first -
as there is a single dt field at line start, payload (after schema line) should have been easy fix to get the record lines correct
with addtional "LF" special character insert to allow alines() with later parameters -
somehow rinse and repeat for each line after eliminating erroneous ["].

I can envision a dictionary approach trying to verify the correct character coding / conversion was found, but that is still guess ?

But I did not aim for general/global fix, which you seem to have implemented -
would appreciate you sending over your version before and after change, so I can see your changes in a diff.

congrats! (plus happy new year, if you are in Portugal at the moment...)

thomas
Previous
Next
Reply
Map
View

Click here to load this message in the networking platform