Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Importing CSV with embedded commas
Message
From
29/05/2015 14:47:16
 
 
To
29/05/2015 12:46:34
General information
Forum:
Visual FoxPro
Category:
Other
Environment versions
Visual FoxPro:
VFP 9 SP2
OS:
Windows 7
Network:
Windows 2008 Server
Database:
MS SQL Server
Application:
Desktop
Miscellaneous
Thread ID:
01620336
Message ID:
01620357
Views:
45
>>>Our client(s) is send us CSV files with demographics of people they serve. The problem is that a few of the rows in the CSV have names and/or addresses with embedded comma in them. This is throwing the APPEND a problem. The wrong data is ending up in the wrong columns.
>>>Can anyone suggest how to prevent this from happen at my end.
>>Hi,
>>Would you be able to show us such a line? and mark which embedded comma troubles you?
>>Would be achallenge to construct a code around this.
>>Rgds,
>>Koen
>
>1700000366,Altman,David,L,DDS,M,,19511218,,P,1164523163,NONE,,University of California-San Francisco,,1978,
>
>The school name at the source is ",University of California-San Francisco,". The commas are resulting is the Year being dropped.
>Another example is:
>
>1700348005,Murphree, Jr,James,R,DMD,M,,19610330,,P,1932297926,NONE,UNIVERSITY OF ALABAMA,2000,
>
>Where the last name at the source is "Murphree, Jr".
>
>All the CSV files have different structures (COLUMNS), therefore scanning and adjusting can be a challenge.


If there is a consistency to the file in terms of what content is being provided in each line then one could build a "clever" convertor. For example, if you know that each line contains, say, 8 fields such as ID number, surname, first name, DOB, University, graduation year, etc. and you also know the order that these fields will appear in, then you could convert the file using comma as the delimiter, and then process each line and check each converted field for the various components you know must be there and build the final fields manually. First field is ID, second and all fields up to the first field starting with numerics are part of the name. First field with numerics is the DOB. Next field with numerics is whatever. Last field is the year. And so you build a series of rules to process each line. It relies on the assumption that each line contains the same set of fields and then using logic to piece the fields together and produce a final correct set of fields.

In addition, you could mark all records which contain more than the required number of commas with a tag for manual review by a human. Just an option.

Just my non-expert opinion :)

.
In the End, we will remember not the words of our enemies, but the silence of our friends - Martin Luther King, Jr.
Previous
Reply
Map
View

Click here to load this message in the networking platform