Importing CSV with embedded commas

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

Importing CSV with embedded commas

Message

From

29/05/2015 14:47:16

Jos Pols
C., South Africa

29/05/2015 12:46:34

Greg Reichert
GLR Software
Providence, Rhode Island, United States

General information

Forum:

Visual FoxPro

Category:

Other

Title:

Re: Importing CSV with embedded commas

Environment versions

Visual FoxPro:

VFP 9 SP2

OS:

Windows 7

Network:

Windows 2008 Server

Database:

MS SQL Server

Application:

Desktop

Miscellaneous

Thread ID:

01620336

Message ID:

01620357

Views:

>>>Our client(s) is send us CSV files with demographics of people they serve. The problem is that a few of the rows in the CSV have names and/or addresses with embedded comma in them. This is throwing the APPEND a problem. The wrong data is ending up in the wrong columns.
>>>Can anyone suggest how to prevent this from happen at my end.
>>Hi,
>>Would you be able to show us such a line? and mark which embedded comma troubles you?
>>Would be achallenge to construct a code around this.
>>Rgds,
>>Koen
>
>1700000366,Altman,David,L,DDS,M,,19511218,,P,1164523163,NONE,,University of California-San Francisco,,1978,
>
>The school name at the source is ",University of California-San Francisco,". The commas are resulting is the Year being dropped.
>Another example is:
>
>1700348005,Murphree, Jr,James,R,DMD,M,,19610330,,P,1932297926,NONE,UNIVERSITY OF ALABAMA,2000,
>
>Where the last name at the source is "Murphree, Jr".
>
>All the CSV files have different structures (COLUMNS), therefore scanning and adjusting can be a challenge.

If there is a consistency to the file in terms of what content is being provided in each line then one could build a "clever" convertor. For example, if you know that each line contains, say, 8 fields such as ID number, surname, first name, DOB, University, graduation year, etc. and you also know the order that these fields will appear in, then you could convert the file using comma as the delimiter, and then process each line and check each converted field for the various components you know must be there and build the final fields manually. First field is ID, second and all fields up to the first field starting with numerics are part of the name. First field with numerics is the DOB. Next field with numerics is whatever. Last field is the year. And so you build a series of rules to process each line. It relies on the assumption that each line contains the same set of fields and then using logic to piece the fields together and produce a final correct set of fields.

In addition, you could mark all records which contain more than the required number of commas with a tag for manual review by a human. Just an option.

Just my non-expert opinion :)

.

In the End, we will remember not the words of our enemies, but the silence of our friends - Martin Luther King, Jr.

Map

View

Click here to load this message in the networking platform