Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
CSV files with comma and quotes
Message
From
09/11/2018 09:03:11
 
 
To
09/11/2018 08:46:59
General information
Forum:
Visual FoxPro
Category:
Coding, syntax & commands
Environment versions
Visual FoxPro:
VFP 9 SP2
Miscellaneous
Thread ID:
01663238
Message ID:
01663242
Views:
90
This message has been marked as a message which has helped to the initial question of the thread.
>Hi All
>
>I have a CSV file which uses comma as the primary delimiter between fields. However, some fields might contain a comma (e.g. a description field) and the supplier provides those fields enclosed in quotes. For example:
>
>AAA,BBB,"XXX YYY, ZZZ",CCC,"X1, Y1, Z1",DDD,
>
>I cannot do an import on this CSV file as it contains too many fields. I need to parse the file, line by line.
>
>How can I get each field so that the result would be:
>
>Field 1: AAA
>Field 2: BBB
>Field 3: XXX YYY, ZZZ
>Field 4: CCC
>Field 5: X1, Y1, Z1
>Field 6: DDD
>
>GETWORDNUM() works for fields separated with a comma but then it picks up individual sections in the description fields which are enclosed in quotes. I can write a manual parsing routine but was wondering if there was a faster way.
>
>TIA

Create a function that processes each line into an escaped form:
line = escape(line)
Your escape() function scans character-by-character across and when it encounters a ," combination, it enters a quoted content where escape will be valid. When it encounters a ", combination, it leaves quoted content where escape will not be valid. UPDATE: Also needs to check the beginning field. Code updated below.

Untested, and off the top of my head:
FUNCTION escape
LPARAMETERS tcLine
LOCAL lnI, lcLine, llInQuote

    FOR lnI = 1 TO LEN(tcLine)
        DO CASE
            CASE NOT llInQuote AND (CHRTRAN(SUBSTR(tcLine, lnI, 2), CHR(32), SPACE(0)) = "," + CHR(34) ;
                 OR (lnI = 1 AND CHRTRAN(SUBSTR(tcLine, lnI, 2), CHR(32), SPACE(0)) = CHR(34)))
                * We're entering a quote
                llInQuote = .t.

                * Skip past the double-quote
                FOR lnI = lnI to LEN(tcLine)
                    lcLine = lcLine + SUBSTR(tcLine, lnI, 1)
                    IF SUBSTR(tcLine, lnI, 1) = CHR(34)
                        EXIT
                    ENDIF
                NEXT

            CASE llInQuote AND CHRTRAN(SUBSTR(tcLine, lnI, 2), CHR(32), SPACE(0)) = CHR(34) + ","
                llInQuote = .f.
                lcLine = lcLine + SUBSTR(tcLine, lnI, 1)

            CASE llInQuote
                * See if it's a character we want to escape
                IF INLIST(SUBSTR(tcLine, lnI, 1), ",")
                    * Escape it
                    lcLine = lcLine + "%" + (ASC(SUBSTR(tcLine, lnI, 1))) + "%"
                ELSE
                    * As-is
                    lcLine = lcLine + SUBSTR(tcLine, lnI, 1)
                ENDIF

            OTHERWISE
                * As-is
                lcLine = lcLine + SUBSTR(tcLine, lnI, 1)
        ENDCASE
    NEXT
    RETURN lcLine
And you can put m. in there where needed.

This function should take your example and convert it to:
AAA,BBB,"XXX YYY%44% ZZZ",CCC,"X1%44% Y1%44% Z1",DDD,
To undo, create an unescape() function which scans for %##% content, and replaces it with CHR(VAL(SUBSTR(tcLine, lnI + 1))). And you can use other characters than % to make it like html with &#nnn; and the like. It's your choice.
Previous
Next
Reply
Map
View

Click here to load this message in the networking platform