Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Converting zip to dbf
Message
From
31/12/2018 10:12:42
 
General information
Forum:
Visual FoxPro
Category:
Databases,Tables, Views, Indexing and SQL syntax
Miscellaneous
Thread ID:
01664985
Message ID:
01665002
Views:
46
>>>>How fast, how much data (no zip attached...) ?
>>>>how much effort needed/expected for data cleaning ?
>>>>(think export errors, bungled field separators included in text, false data formats if coming from Excel [date/time/text...],
>>>>how much is it worth to you on Thursday?
>>>>how much is it worth to you tomorrow ?
>>>>
>>>>
>>>>>I have a zip file which contains many fields - I need to extract the data to a dbf file
>>>>>
>>>>>I cant work out how to do it
>>>>>
>>>>>The file is attached
>>>>>Any chance anyone could create the dbf file
>>>
>>>Data is clean should have no errors
>>>
>>>Say Monday next week - how much will you charge?
>>
>>Ahem...
>>
>>>I have a zip file which contains many fields - I need to extract the data to a dbf file
>>
>>see embolded - false.
>> you have a zip file with
>>
>>11 files with html extension, but containing mostly text and some markers, probably from one of the early MS generating tools
>>28 files with csv extension and on spot check basis csv-with-headers structure (saw / encountered always different structures)
>>
>>>Data is clean should have no errors
>>Ahem... thoroughly false
>>Spot check in shares.csv shows:
>>
>>
>>
>>
>>2016-11-17 11:03:27,https://www.linkedin.com/feed/update/urn%3Ali%3Ashare%3A6204971998725042178,"""https://lnkd.in/dWwrSJy""",,,MEMBER_NETWORK
>>2016-10-28 10:41:00,https://www.linkedin.com/feed/update/urn%3Ali%3Ashare%3A6197718591576567808,"Whatever your business, if you need Consultancy, Marketing or IT help call Colin Northway Associates on 0208 954 5595 or email colin@colin-northway.com""
>>""""
>>""Strong track record in Automotive, Courier and Gas and Electicity Industries. Strong track record in Customer Retention.""
>>""""
>>""Can you afford not to contact us? Then see our web site at www.colin-northway.com",http://image-store.slidesharecdn.com/39cb4cb0-43b2-43f4-b9a4-1324ac65f2ca-large.png,http://image-store.slidesharecdn.com/39cb4cb0-43b2-43f4-b9a4-1324ac65f2ca-large.png,MEMBER_NETWORK
>>
>>many violations of basic csv rules concerning field/record delimination
>>
>>
>>2016-03-14 13:48:30,https://www.linkedin.com/feed/update/urn%3Ali%3Ashare%3A6115141345649651712,I have a backgound in Advertising Marketing and IT and I am currently running a IT and Marketing consultancy operation specialising in the Courier Industry. If this rings any bells with you contact me at colin@colin-northway.com ,,,MEMBER_NETWORK
>>
>>missing ["] in front of [I have a backgound]
>>
>>
>>connections.csv
>>
>>Akeel,Hussain,,ICON DESIGNS LIMITED,"International and UK Sales, Operations & Logistics",24 Dec 2017
>>...
>>Grigore Octavian,Pupăzan,,Run4Job,Director General,20 Dec 2017 
>>...
>>Joanna,Szmaglińska,,VISLINE,Managing Director,11 Dec 2017 
>>
>>(1) mixing/different rules of delimiting character fields
>>(2) character set interspersed with non-8-bit ASCII without ay hints on what character set was intended
>>
>>stopped looking then.
>>Probably some (misguided) manual screen scraping of web/html, incurring at least those errors,
>>with high probability others needing different solution strategies..
>>
>>Finding so many (not hard to solve, but irritating) problems in first scans, no attempt at quoting a price possible,
>>as more lurks/any realistic quote needs more than half the work to be done for real.
>>
>>Possible solution: reading in tables on hourly basis with checking / control offloaded to you.
>>
>>Importing smallest files first has benefit that checking each import is done quickly,
>>errors can be fixed either with manual editing or programmatic fix, but TYPE of possible error becomes known.
>>
>>Importing larger files first has benefit that "more info per €" is imported first, as more can be handled automatically,
>>but fixing errors like field/record delimitation if happening often will have to be done by tiny programs/fixes
>>if done by me or multiple copy-paste sessions if done by you.
>>
>>Rate is 90€ plus VAT per H, can be stopped at any hour after at least 2 with either approach, if you deem rest is better handled manually.
>>
>>Option of "clean screen scraping" or analyzing source HTML if available exist,
>>but "clean screen scraping" probably not worth it due to many table with pratically no large record counts.
>>
>>regards
>>
>>thomas
>
>THanks but I now have free offer from an old colleague

Give him at least 2 expensive bottles of scotch, such work can be irritating if only a few hundred records are to be read but every 20th needs a different fix....
Previous
Reply
Map
View

Click here to load this message in the networking platform