Removing Unwanted commas from a csv

Question

I'm writing a program in Progress, OpenEdge, ABL, and whatever else it's known as.

I have a CSV file that is delimited by commas. However, there is a "gift message" field, and users enter messages with "commas", so now my program will see additional entries because of those bad commas.

The CSV fields are not in double qoutes so I CAN NOT just use my main method with is

/** this next block of code will remove all unwanted commas from the data. **/
    if v-line-cnt > 1 then /** we won't run this against the headers. Otherwise thhey will get deleted **/
      assign
        v-data = replace(v-data,'","',"\t")  /** Here is a special technique to replace the comma delim wiht a tab **/
        v-data = replace(v-data,','," ")     /** now that we removed the comma delim above, we can remove all nuisance commas **/
        v-data = replace(v-data,"\t",'","'). /** all nuisance commas are gone, we turn the tabs back to commas. **/

Any advice?

edit:

From Progress, I cal call Linux commands. So I should be able to execute C++/PHP/Shell etc all from my Progress Program. I look forward to advice, until then I shall look into using external scripts.

You can either use a different delimiter instead of a comma, or replace commas with a different character before adding it to the CSV. — bupereira, Dec 02 '15 at 22:19
If you get this file from elsewhere I believe you are doomed. That data format is known as non-CSV :) — 500 - Internal Server Error, Dec 02 '15 at 22:21
Add an example row of your file as well as your ABL IMPORT statement (or whatever method you use). — Jensd, Dec 03 '15 at 07:23
@ bupereira We get the csv through EDI from our customer, they made the CSV so we don't much have the option to use a different delimiter at this point. — Joseph Kreifels II, Dec 03 '15 at 14:00

score 2 · Answer 1 · answered Dec 03 '15 at 07:38

You are not providing quite enough data for a perfect answer but given what you say I think the IMPORT statement should handle this automatically.

In my example here commaimport.csv is a comma-separated csv-file with quotes around text fields. Integers, logical variables etc have no quotes. The last field contains a comma in one line:

commaimport.csv
=======================
"Id1", 123, NO, "This is a message"
"Id2", 124, YES, "This is a another message, with a comma"
"Id3", 323, NO, "This is a another message without a comma"

To import this file I define a temp-table matching the file layout and use the IMPORT statement with comma as delimiter:

DEFINE TEMP-TABLE ttImport NO-UNDO
    FIELD field1 AS CHARACTER FORMAT "xxx"
    FIELD field2 AS INTEGER   FORMAT "zz9"
    FIELD field3 AS LOGICAL   
    FIELD field4 AS CHARACTER FORMAT "x(50)".

INPUT FROM VALUE("c:\temp\commaimport.csv").
REPEAT :
    CREATE ttImport.
    IMPORT DELIMITER "," ttImport.
END.
INPUT CLOSE.

FOR EACH ttImport:
    DISPLAY ttImport.
END.

You don't have to import into a temp-table. You could import into variables instead.

DEFINE VARIABLE c AS CHARACTER   NO-UNDO FORMAT "xxx".
DEFINE VARIABLE i AS INTEGER     NO-UNDO FORMAT "zz9".
DEFINE VARIABLE l AS LOGICAL     NO-UNDO.
DEFINE VARIABLE d AS CHARACTER   NO-UNDO FORMAT "x(50)".

INPUT FROM VALUE("c:\temp\commaimport.csv").
REPEAT :
    IMPORT DELIMITER "," c i l d.
    DISP c i l d.
END.
INPUT CLOSE.

This will render basically the same output:

score 2 · Answer 2 · answered Dec 03 '15 at 11:27

You don't show what your data file looks like. But if the problematic field is the last one, and there are no quotes, then your best bet is probably to read it using INPUT UNFORMATTED to get it a line at a time, and then split the line into fields using ENTRY(). That way you can treat everything after the nth comma as a single field no matter how many commas the line has.

For example, say your input file has three columns like this:

boris,14.23,12 the avenue
mark,32.10,flat 1, the grange
percy,1.00,Bleak house, Dartmouth

... so that column three is an address which might contain a comma and is not enclosed in quotes so that IMPORT DELIMITER can't help you.

Something like this would work in that case:

/* ...skipping a lot of definitions here ... */

input from "datafile.csv".
repeat:
    import unformatted v-line.
    create tt-thing.
    assign tt-thing.name    = entry(1, v-line, ',')
           tt-thing.price   = entry(2, v-line, ',')
           tt-thing.address = entry(3, v-line, ',').

    do v=i = 4 to num-entries(v-line, ','):
        tt-thing.address = tt-thing.address 
                         + ','
                         + entry(v-i, v-line, ',').

    end.

end.
input close.

Can I make that work if the comma field is in the middle not the end? — Joseph Kreifels II, Dec 03 '15 at 14:06
Sure, but it will be harder. Say you have five columns, and the problem column is number three. You can take `entry(1)` and `entry(2)` as above to get the first two columns; the last two columns will be `entry(x)` and `entry(x-1)` where `x = num-entries(v-line, ',')` Anything between the 2nd comma and the (x-1)th comma will be column three. — Andy Jones, Dec 03 '15 at 15:06
Note that if you have _two_ problem fields, all bets are off. I don't think there is a solution in that case. — Andy Jones, Dec 03 '15 at 15:10

Removing Unwanted commas from a csv

2 Answers2