>>>Our client(s) is send us CSV files with demographics of people they serve. The problem is that a few of the rows in the CSV have names and/or addresses with embedded comma in them. This is throwing the APPEND a problem. The wrong data is ending up in the wrong columns.
>>>Can anyone suggest how to prevent this from happen at my end.
>>Hi,
>>Would you be able to show us such a line? and mark which embedded comma troubles you?
>>Would be achallenge to construct a code around this.
>>Rgds,
>>Koen
>
>1700000366,Altman,David,L,DDS,M,,19511218,,P,1164523163,NONE,,University of California-San Francisco,,1978,
>
>The school name at the source is ",University of California-San Francisco,". The commas are resulting is the Year being dropped.
>Another example is:
>
>1700348005,Murphree, Jr,James,R,DMD,M,,19610330,,P,1932297926,NONE,UNIVERSITY OF ALABAMA,2000,
>
>Where the last name at the source is "Murphree, Jr".
>
>All the CSV files have different structures (COLUMNS), therefore scanning and adjusting can be a challenge.
If there is a consistency to the file in terms of what content is being provided in each line then one could build a "clever" convertor. For example, if you know that each line contains, say, 8 fields such as ID number, surname, first name, DOB, University, graduation year, etc. and you also know the order that these fields will appear in, then you could convert the file using comma as the delimiter, and then process each line and check each converted field for the various components you know must be there and build the final fields manually. First field is ID, second and all fields up to the first field starting with numerics are part of the name. First field with numerics is the DOB. Next field with numerics is whatever. Last field is the year. And so you build a series of rules to process each line. It relies on the assumption that each line contains the same set of fields and then using logic to piece the fields together and produce a final correct set of fields.
In addition, you could mark all records which contain more than the required number of commas with a tag for manual review by a human. Just an option.
Just my non-expert opinion :)
.
In the End, we will remember not the words of our enemies, but the silence of our friends - Martin Luther King, Jr.